qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tests/functional: fix race in virtio balloon test
@ 2025-03-04 18:33 Daniel P. Berrangé
  2025-03-05  7:58 ` Thomas Huth
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Daniel P. Berrangé @ 2025-03-04 18:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Thomas Huth, David Hildenbrand,
	Daniel P. Berrangé

There are two race conditions in the recently added virtio balloon
test

 * The /dev/vda device node is not ready
 * The virtio-balloon driver has not issued the first stats refresh

To fix the former, monitor dmesg for a line about 'vda'.

To fix the latter, retry the stats query until seeing fresh data.

Adding 'quiet' to the kernel command line reduces serial output
which otherwise slows boot, making it less likely to hit the former
race too.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/test_virtio_balloon.py | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/tests/functional/test_virtio_balloon.py b/tests/functional/test_virtio_balloon.py
index 67b48e1b4e..308d197eb3 100755
--- a/tests/functional/test_virtio_balloon.py
+++ b/tests/functional/test_virtio_balloon.py
@@ -32,7 +32,7 @@ class VirtioBalloonx86(QemuSystemTest):
         'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0')
 
     DEFAULT_KERNEL_PARAMS = ('root=/dev/vda1 console=ttyS0 net.ifnames=0 '
-                             'rd.rescue')
+                             'rd.rescue quiet')
 
     def wait_for_console_pattern(self, success_message, vm=None):
         wait_for_console_pattern(
@@ -47,6 +47,9 @@ def mount_root(self):
         prompt = '# '
         self.wait_for_console_pattern(prompt)
 
+        # Synchronize on virtio-block driver creating the root device
+        exec_command_and_wait_for_pattern(self, "while ! (dmesg -c | grep vda:) ; do sleep 1 ; done", "vda1")
+
         exec_command_and_wait_for_pattern(self, 'mount /dev/vda1 /sysroot',
                                           prompt)
         exec_command_and_wait_for_pattern(self, 'chroot /sysroot',
@@ -65,10 +68,21 @@ def assert_initial_stats(self):
             assert val == UNSET_STATS_VALUE
 
     def assert_running_stats(self, then):
-        ret = self.vm.qmp('qom-get',
-                          {'path': '/machine/peripheral/balloon',
-                           'property': 'guest-stats'})['return']
-        when = ret.get('last-update')
+        # We told the QEMU to refresh stats every 100ms, but
+        # there can be a delay between virtio-ballon driver
+        # being modprobed and seeing the first stats refresh
+        # Retry a few times for robustness under heavy load
+        retries = 10
+        when = 0
+        while when == 0 and retries:
+            ret = self.vm.qmp('qom-get',
+                              {'path': '/machine/peripheral/balloon',
+                               'property': 'guest-stats'})['return']
+            when = ret.get('last-update')
+            if when == 0:
+                retries = retries - 1
+                time.sleep(0.5)
+
         now = time.time()
 
         assert when > then and when < now
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests/functional: fix race in virtio balloon test
  2025-03-04 18:33 [PATCH] tests/functional: fix race in virtio balloon test Daniel P. Berrangé
@ 2025-03-05  7:58 ` Thomas Huth
  2025-03-05 12:25 ` Philippe Mathieu-Daudé
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Thomas Huth @ 2025-03-05  7:58 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel; +Cc: Michael S. Tsirkin, David Hildenbrand

On 04/03/2025 19.33, Daniel P. Berrangé wrote:
> There are two race conditions in the recently added virtio balloon
> test
> 
>   * The /dev/vda device node is not ready
>   * The virtio-balloon driver has not issued the first stats refresh
> 
> To fix the former, monitor dmesg for a line about 'vda'.
> 
> To fix the latter, retry the stats query until seeing fresh data.
> 
> Adding 'quiet' to the kernel command line reduces serial output
> which otherwise slows boot, making it less likely to hit the former
> race too.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/functional/test_virtio_balloon.py | 24 +++++++++++++++++++-----
>   1 file changed, 19 insertions(+), 5 deletions(-)

Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests/functional: fix race in virtio balloon test
  2025-03-04 18:33 [PATCH] tests/functional: fix race in virtio balloon test Daniel P. Berrangé
  2025-03-05  7:58 ` Thomas Huth
@ 2025-03-05 12:25 ` Philippe Mathieu-Daudé
  2025-03-06 17:42   ` Thomas Huth
  2025-03-07  8:04 ` David Hildenbrand
  2025-04-02 16:27 ` Michael S. Tsirkin
  3 siblings, 1 reply; 8+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-03-05 12:25 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Michael S. Tsirkin, Thomas Huth, David Hildenbrand

Hi Daniel,

On 4/3/25 19:33, Daniel P. Berrangé wrote:
> There are two race conditions in the recently added virtio balloon
> test
> 
>   * The /dev/vda device node is not ready
>   * The virtio-balloon driver has not issued the first stats refresh
> 
> To fix the former, monitor dmesg for a line about 'vda'.
> 
> To fix the latter, retry the stats query until seeing fresh data.
> 
> Adding 'quiet' to the kernel command line reduces serial output
> which otherwise slows boot, making it less likely to hit the former
> race too.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/functional/test_virtio_balloon.py | 24 +++++++++++++++++++-----
>   1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/functional/test_virtio_balloon.py b/tests/functional/test_virtio_balloon.py
> index 67b48e1b4e..308d197eb3 100755
> --- a/tests/functional/test_virtio_balloon.py
> +++ b/tests/functional/test_virtio_balloon.py
> @@ -32,7 +32,7 @@ class VirtioBalloonx86(QemuSystemTest):
>           'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0')
>   
>       DEFAULT_KERNEL_PARAMS = ('root=/dev/vda1 console=ttyS0 net.ifnames=0 '
> -                             'rd.rescue')
> +                             'rd.rescue quiet')
>   
>       def wait_for_console_pattern(self, success_message, vm=None):
>           wait_for_console_pattern(
> @@ -47,6 +47,9 @@ def mount_root(self):
>           prompt = '# '
>           self.wait_for_console_pattern(prompt)
>   
> +        # Synchronize on virtio-block driver creating the root device
> +        exec_command_and_wait_for_pattern(self, "while ! (dmesg -c | grep vda:) ; do sleep 1 ; done", "vda1")
> +
>           exec_command_and_wait_for_pattern(self, 'mount /dev/vda1 /sysroot',
>                                             prompt)
>           exec_command_and_wait_for_pattern(self, 'chroot /sysroot',
> @@ -65,10 +68,21 @@ def assert_initial_stats(self):
>               assert val == UNSET_STATS_VALUE
>   
>       def assert_running_stats(self, then):
> -        ret = self.vm.qmp('qom-get',
> -                          {'path': '/machine/peripheral/balloon',
> -                           'property': 'guest-stats'})['return']
> -        when = ret.get('last-update')
> +        # We told the QEMU to refresh stats every 100ms, but
> +        # there can be a delay between virtio-ballon driver
> +        # being modprobed and seeing the first stats refresh
> +        # Retry a few times for robustness under heavy load
> +        retries = 10
> +        when = 0
> +        while when == 0 and retries:
> +            ret = self.vm.qmp('qom-get',
> +                              {'path': '/machine/peripheral/balloon',
> +                               'property': 'guest-stats'})['return']
> +            when = ret.get('last-update')
> +            if when == 0:
> +                retries = retries - 1
> +                time.sleep(0.5)
> +
>           now = time.time()
>   
>           assert when > then and when < now

Unfortunately I'm still getting a timeout:
https://gitlab.com/philmd/qemu/-/jobs/9318095233

2025-03-05 12:09:55,360 - DEBUG: Console interaction: 
success_msg='Entering emergency mode.' failure_msg='Kernel panic - not 
syncing' send_string='None'
2025-03-05 12:09:55,360 - DEBUG: Opening console socket
2025-03-05 12:10:32,722 - DEBUG: Console interaction: success_msg='# ' 
failure_msg='Kernel panic - not syncing' send_string='None'
2025-03-05 12:10:32,823 - DEBUG: Console interaction: success_msg='vda1' 
failure_msg='None' send_string='while ! (dmesg -c | grep vda:) ; do 
sleep 1 ; done

2025-03-05 12:10:30,534: Warning: /dev/vda1 does not exist
2025-03-05 12:10:30,535:
2025-03-05 12:10:30,598: Generating "/run/initramfs/rdsosreport.txt"
2025-03-05 12:10:32,720:
2025-03-05 12:10:32,721:
2025-03-05 12:10:32,722: Entering emergency mode.
2025-03-05 12:10:32,724: Exit the shell to continue.
2025-03-05 12:10:32,726: Type "journalctl" to view system logs.
2025-03-05 12:10:32,727: You might want to save 
"/run/initramfs/rdsosreport.txt" to a USB stick or /boot
2025-03-05 12:10:32,728: after mounting them and attach it to a bug report.
2025-03-05 12:10:32,729:
2025-03-05 12:10:32,731:
2025-03-05 12:10:32,823: :/#



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests/functional: fix race in virtio balloon test
  2025-03-05 12:25 ` Philippe Mathieu-Daudé
@ 2025-03-06 17:42   ` Thomas Huth
  2025-03-06 19:23     ` Thomas Huth
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Huth @ 2025-03-06 17:42 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Daniel P. Berrangé, qemu-devel
  Cc: Michael S. Tsirkin, David Hildenbrand

On 05/03/2025 13.25, Philippe Mathieu-Daudé wrote:
> Hi Daniel,
> 
> On 4/3/25 19:33, Daniel P. Berrangé wrote:
>> There are two race conditions in the recently added virtio balloon
>> test
>>
>>   * The /dev/vda device node is not ready
>>   * The virtio-balloon driver has not issued the first stats refresh
>>
>> To fix the former, monitor dmesg for a line about 'vda'.
>>
>> To fix the latter, retry the stats query until seeing fresh data.
>>
>> Adding 'quiet' to the kernel command line reduces serial output
>> which otherwise slows boot, making it less likely to hit the former
>> race too.
>>
>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>> ---
>>   tests/functional/test_virtio_balloon.py | 24 +++++++++++++++++++-----
>>   1 file changed, 19 insertions(+), 5 deletions(-)
>>
>> diff --git a/tests/functional/test_virtio_balloon.py b/tests/functional/ 
>> test_virtio_balloon.py
>> index 67b48e1b4e..308d197eb3 100755
>> --- a/tests/functional/test_virtio_balloon.py
>> +++ b/tests/functional/test_virtio_balloon.py
>> @@ -32,7 +32,7 @@ class VirtioBalloonx86(QemuSystemTest):
>>           'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0')
>>       DEFAULT_KERNEL_PARAMS = ('root=/dev/vda1 console=ttyS0 net.ifnames=0 '
>> -                             'rd.rescue')
>> +                             'rd.rescue quiet')
>>       def wait_for_console_pattern(self, success_message, vm=None):
>>           wait_for_console_pattern(
>> @@ -47,6 +47,9 @@ def mount_root(self):
>>           prompt = '# '
>>           self.wait_for_console_pattern(prompt)
>> +        # Synchronize on virtio-block driver creating the root device
>> +        exec_command_and_wait_for_pattern(self, "while ! (dmesg -c | grep 
>> vda:) ; do sleep 1 ; done", "vda1")
>> +
>>           exec_command_and_wait_for_pattern(self, 'mount /dev/vda1 /sysroot',
>>                                             prompt)
>>           exec_command_and_wait_for_pattern(self, 'chroot /sysroot',
>> @@ -65,10 +68,21 @@ def assert_initial_stats(self):
>>               assert val == UNSET_STATS_VALUE
>>       def assert_running_stats(self, then):
>> -        ret = self.vm.qmp('qom-get',
>> -                          {'path': '/machine/peripheral/balloon',
>> -                           'property': 'guest-stats'})['return']
>> -        when = ret.get('last-update')
>> +        # We told the QEMU to refresh stats every 100ms, but
>> +        # there can be a delay between virtio-ballon driver
>> +        # being modprobed and seeing the first stats refresh
>> +        # Retry a few times for robustness under heavy load
>> +        retries = 10
>> +        when = 0
>> +        while when == 0 and retries:
>> +            ret = self.vm.qmp('qom-get',
>> +                              {'path': '/machine/peripheral/balloon',
>> +                               'property': 'guest-stats'})['return']
>> +            when = ret.get('last-update')
>> +            if when == 0:
>> +                retries = retries - 1
>> +                time.sleep(0.5)
>> +
>>           now = time.time()
>>           assert when > then and when < now
> 
> Unfortunately I'm still getting a timeout:
> https://gitlab.com/philmd/qemu/-/jobs/9318095233
> 
> 2025-03-05 12:09:55,360 - DEBUG: Console interaction: success_msg='Entering 
> emergency mode.' failure_msg='Kernel panic - not syncing' send_string='None'
> 2025-03-05 12:09:55,360 - DEBUG: Opening console socket
> 2025-03-05 12:10:32,722 - DEBUG: Console interaction: success_msg='# ' 
> failure_msg='Kernel panic - not syncing' send_string='None'
> 2025-03-05 12:10:32,823 - DEBUG: Console interaction: success_msg='vda1' 
> failure_msg='None' send_string='while ! (dmesg -c | grep vda:) ; do sleep 
> 1 ; done
> 
> 2025-03-05 12:10:30,534: Warning: /dev/vda1 does not exist
> 2025-03-05 12:10:30,535:
> 2025-03-05 12:10:30,598: Generating "/run/initramfs/rdsosreport.txt"
> 2025-03-05 12:10:32,720:
> 2025-03-05 12:10:32,721:
> 2025-03-05 12:10:32,722: Entering emergency mode.
> 2025-03-05 12:10:32,724: Exit the shell to continue.
> 2025-03-05 12:10:32,726: Type "journalctl" to view system logs.
> 2025-03-05 12:10:32,727: You might want to save "/run/initramfs/ 
> rdsosreport.txt" to a USB stick or /boot
> 2025-03-05 12:10:32,728: after mounting them and attach it to a bug report.
> 2025-03-05 12:10:32,729:
> 2025-03-05 12:10:32,731:
> 2025-03-05 12:10:32,823: :/#

Same for me, it always seems to hang when being run with the gitlab shared 
runners:

  https://gitlab.com/thuth/qemu/-/jobs/9333926038#L612
  https://gitlab.com/thuth/qemu/-/jobs/9333926046#L625

... no clue what's still going wrong, though ...

  Thomas



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests/functional: fix race in virtio balloon test
  2025-03-06 17:42   ` Thomas Huth
@ 2025-03-06 19:23     ` Thomas Huth
  2025-03-07  8:02       ` Daniel P. Berrangé
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Huth @ 2025-03-06 19:23 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Daniel P. Berrangé, qemu-devel
  Cc: Michael S. Tsirkin, David Hildenbrand

On 06/03/2025 18.42, Thomas Huth wrote:
> On 05/03/2025 13.25, Philippe Mathieu-Daudé wrote:
>> Hi Daniel,
>>
>> On 4/3/25 19:33, Daniel P. Berrangé wrote:
>>> There are two race conditions in the recently added virtio balloon
>>> test
>>>
>>>   * The /dev/vda device node is not ready
>>>   * The virtio-balloon driver has not issued the first stats refresh
>>>
>>> To fix the former, monitor dmesg for a line about 'vda'.
>>>
>>> To fix the latter, retry the stats query until seeing fresh data.
>>>
>>> Adding 'quiet' to the kernel command line reduces serial output
>>> which otherwise slows boot, making it less likely to hit the former
>>> race too.
>>>
>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>> ---
>>>   tests/functional/test_virtio_balloon.py | 24 +++++++++++++++++++-----
>>>   1 file changed, 19 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/tests/functional/test_virtio_balloon.py b/tests/functional/ 
>>> test_virtio_balloon.py
>>> index 67b48e1b4e..308d197eb3 100755
>>> --- a/tests/functional/test_virtio_balloon.py
>>> +++ b/tests/functional/test_virtio_balloon.py
>>> @@ -32,7 +32,7 @@ class VirtioBalloonx86(QemuSystemTest):
>>>           
>>> 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0')
>>>       DEFAULT_KERNEL_PARAMS = ('root=/dev/vda1 console=ttyS0 net.ifnames=0 '
>>> -                             'rd.rescue')
>>> +                             'rd.rescue quiet')
>>>       def wait_for_console_pattern(self, success_message, vm=None):
>>>           wait_for_console_pattern(
>>> @@ -47,6 +47,9 @@ def mount_root(self):
>>>           prompt = '# '
>>>           self.wait_for_console_pattern(prompt)
>>> +        # Synchronize on virtio-block driver creating the root device
>>> +        exec_command_and_wait_for_pattern(self, "while ! (dmesg -c | 
>>> grep vda:) ; do sleep 1 ; done", "vda1")
>>> +
>>>           exec_command_and_wait_for_pattern(self, 'mount /dev/vda1 / 
>>> sysroot',
>>>                                             prompt)
>>>           exec_command_and_wait_for_pattern(self, 'chroot /sysroot',
>>> @@ -65,10 +68,21 @@ def assert_initial_stats(self):
>>>               assert val == UNSET_STATS_VALUE
>>>       def assert_running_stats(self, then):
>>> -        ret = self.vm.qmp('qom-get',
>>> -                          {'path': '/machine/peripheral/balloon',
>>> -                           'property': 'guest-stats'})['return']
>>> -        when = ret.get('last-update')
>>> +        # We told the QEMU to refresh stats every 100ms, but
>>> +        # there can be a delay between virtio-ballon driver
>>> +        # being modprobed and seeing the first stats refresh
>>> +        # Retry a few times for robustness under heavy load
>>> +        retries = 10
>>> +        when = 0
>>> +        while when == 0 and retries:
>>> +            ret = self.vm.qmp('qom-get',
>>> +                              {'path': '/machine/peripheral/balloon',
>>> +                               'property': 'guest-stats'})['return']
>>> +            when = ret.get('last-update')
>>> +            if when == 0:
>>> +                retries = retries - 1
>>> +                time.sleep(0.5)
>>> +
>>>           now = time.time()
>>>           assert when > then and when < now
>>
>> Unfortunately I'm still getting a timeout:
>> https://gitlab.com/philmd/qemu/-/jobs/9318095233
>>
>> 2025-03-05 12:09:55,360 - DEBUG: Console interaction: 
>> success_msg='Entering emergency mode.' failure_msg='Kernel panic - not 
>> syncing' send_string='None'
>> 2025-03-05 12:09:55,360 - DEBUG: Opening console socket
>> 2025-03-05 12:10:32,722 - DEBUG: Console interaction: success_msg='# ' 
>> failure_msg='Kernel panic - not syncing' send_string='None'
>> 2025-03-05 12:10:32,823 - DEBUG: Console interaction: success_msg='vda1' 
>> failure_msg='None' send_string='while ! (dmesg -c | grep vda:) ; do sleep 
>> 1 ; done
>>
>> 2025-03-05 12:10:30,534: Warning: /dev/vda1 does not exist
>> 2025-03-05 12:10:30,535:
>> 2025-03-05 12:10:30,598: Generating "/run/initramfs/rdsosreport.txt"
>> 2025-03-05 12:10:32,720:
>> 2025-03-05 12:10:32,721:
>> 2025-03-05 12:10:32,722: Entering emergency mode.
>> 2025-03-05 12:10:32,724: Exit the shell to continue.
>> 2025-03-05 12:10:32,726: Type "journalctl" to view system logs.
>> 2025-03-05 12:10:32,727: You might want to save "/run/initramfs/ 
>> rdsosreport.txt" to a USB stick or /boot
>> 2025-03-05 12:10:32,728: after mounting them and attach it to a bug report.
>> 2025-03-05 12:10:32,729:
>> 2025-03-05 12:10:32,731:
>> 2025-03-05 12:10:32,823: :/#
> 
> Same for me, it always seems to hang when being run with the gitlab shared 
> runners:
> 
>   https://gitlab.com/thuth/qemu/-/jobs/9333926038#L612
>   https://gitlab.com/thuth/qemu/-/jobs/9333926046#L625
> 
> ... no clue what's still going wrong, though ...

... but I just noticed that all other functional tests that use the same 
assets are using:

         self.require_accelerator('kvm')
         self.vm.add_args('-accel', 'kvm')

so they are skipped on the gitlab shared runners (but still executed in the 
custom runners of the qemu-project), while your test also is enabled for TCG 
and thus runs in the shared runners, too.
So unless you've got a clue what's going wrong here (I fail to see the 
reason for the problem unfortunately), I'd suggest that we mark the 
virtio_balloon test with require_accelerator('kvm'), too, to get the CI 
working with the shared runners again. WDYT?

  Thomas



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests/functional: fix race in virtio balloon test
  2025-03-06 19:23     ` Thomas Huth
@ 2025-03-07  8:02       ` Daniel P. Berrangé
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel P. Berrangé @ 2025-03-07  8:02 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Philippe Mathieu-Daudé, qemu-devel, Michael S. Tsirkin,
	David Hildenbrand

On Thu, Mar 06, 2025 at 08:23:15PM +0100, Thomas Huth wrote:
> On 06/03/2025 18.42, Thomas Huth wrote:
> > On 05/03/2025 13.25, Philippe Mathieu-Daudé wrote:
> > > Hi Daniel,
> > > 
> > > On 4/3/25 19:33, Daniel P. Berrangé wrote:
> > > > There are two race conditions in the recently added virtio balloon
> > > > test
> > > > 
> > > >   * The /dev/vda device node is not ready
> > > >   * The virtio-balloon driver has not issued the first stats refresh
> > > > 
> > > > To fix the former, monitor dmesg for a line about 'vda'.
> > > > 
> > > > To fix the latter, retry the stats query until seeing fresh data.
> > > > 
> > > > Adding 'quiet' to the kernel command line reduces serial output
> > > > which otherwise slows boot, making it less likely to hit the former
> > > > race too.
> > > > 
> > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > ---
> > > >   tests/functional/test_virtio_balloon.py | 24 +++++++++++++++++++-----
> > > >   1 file changed, 19 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/tests/functional/test_virtio_balloon.py
> > > > b/tests/functional/ test_virtio_balloon.py
> > > > index 67b48e1b4e..308d197eb3 100755
> > > > --- a/tests/functional/test_virtio_balloon.py
> > > > +++ b/tests/functional/test_virtio_balloon.py
> > > > @@ -32,7 +32,7 @@ class VirtioBalloonx86(QemuSystemTest):
> > > > 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0')
> > > >       DEFAULT_KERNEL_PARAMS = ('root=/dev/vda1 console=ttyS0 net.ifnames=0 '
> > > > -                             'rd.rescue')
> > > > +                             'rd.rescue quiet')
> > > >       def wait_for_console_pattern(self, success_message, vm=None):
> > > >           wait_for_console_pattern(
> > > > @@ -47,6 +47,9 @@ def mount_root(self):
> > > >           prompt = '# '
> > > >           self.wait_for_console_pattern(prompt)
> > > > +        # Synchronize on virtio-block driver creating the root device
> > > > +        exec_command_and_wait_for_pattern(self, "while ! (dmesg
> > > > -c | grep vda:) ; do sleep 1 ; done", "vda1")
> > > > +
> > > >           exec_command_and_wait_for_pattern(self, 'mount
> > > > /dev/vda1 / sysroot',
> > > >                                             prompt)
> > > >           exec_command_and_wait_for_pattern(self, 'chroot /sysroot',
> > > > @@ -65,10 +68,21 @@ def assert_initial_stats(self):
> > > >               assert val == UNSET_STATS_VALUE
> > > >       def assert_running_stats(self, then):
> > > > -        ret = self.vm.qmp('qom-get',
> > > > -                          {'path': '/machine/peripheral/balloon',
> > > > -                           'property': 'guest-stats'})['return']
> > > > -        when = ret.get('last-update')
> > > > +        # We told the QEMU to refresh stats every 100ms, but
> > > > +        # there can be a delay between virtio-ballon driver
> > > > +        # being modprobed and seeing the first stats refresh
> > > > +        # Retry a few times for robustness under heavy load
> > > > +        retries = 10
> > > > +        when = 0
> > > > +        while when == 0 and retries:
> > > > +            ret = self.vm.qmp('qom-get',
> > > > +                              {'path': '/machine/peripheral/balloon',
> > > > +                               'property': 'guest-stats'})['return']
> > > > +            when = ret.get('last-update')
> > > > +            if when == 0:
> > > > +                retries = retries - 1
> > > > +                time.sleep(0.5)
> > > > +
> > > >           now = time.time()
> > > >           assert when > then and when < now
> > > 
> > > Unfortunately I'm still getting a timeout:
> > > https://gitlab.com/philmd/qemu/-/jobs/9318095233
> > > 
> > > 2025-03-05 12:09:55,360 - DEBUG: Console interaction:
> > > success_msg='Entering emergency mode.' failure_msg='Kernel panic -
> > > not syncing' send_string='None'
> > > 2025-03-05 12:09:55,360 - DEBUG: Opening console socket
> > > 2025-03-05 12:10:32,722 - DEBUG: Console interaction: success_msg='#
> > > ' failure_msg='Kernel panic - not syncing' send_string='None'
> > > 2025-03-05 12:10:32,823 - DEBUG: Console interaction:
> > > success_msg='vda1' failure_msg='None' send_string='while ! (dmesg -c
> > > | grep vda:) ; do sleep 1 ; done
> > > 
> > > 2025-03-05 12:10:30,534: Warning: /dev/vda1 does not exist
> > > 2025-03-05 12:10:30,535:
> > > 2025-03-05 12:10:30,598: Generating "/run/initramfs/rdsosreport.txt"
> > > 2025-03-05 12:10:32,720:
> > > 2025-03-05 12:10:32,721:
> > > 2025-03-05 12:10:32,722: Entering emergency mode.
> > > 2025-03-05 12:10:32,724: Exit the shell to continue.
> > > 2025-03-05 12:10:32,726: Type "journalctl" to view system logs.
> > > 2025-03-05 12:10:32,727: You might want to save "/run/initramfs/
> > > rdsosreport.txt" to a USB stick or /boot
> > > 2025-03-05 12:10:32,728: after mounting them and attach it to a bug report.
> > > 2025-03-05 12:10:32,729:
> > > 2025-03-05 12:10:32,731:
> > > 2025-03-05 12:10:32,823: :/#
> > 
> > Same for me, it always seems to hang when being run with the gitlab
> > shared runners:
> > 
> >   https://gitlab.com/thuth/qemu/-/jobs/9333926038#L612
> >   https://gitlab.com/thuth/qemu/-/jobs/9333926046#L625
> > 
> > ... no clue what's still going wrong, though ...
> 
> ... but I just noticed that all other functional tests that use the same
> assets are using:
> 
>         self.require_accelerator('kvm')
>         self.vm.add_args('-accel', 'kvm')

Hmm, yes, and my testing locally will be with kvm too.

> so they are skipped on the gitlab shared runners (but still executed in the
> custom runners of the qemu-project), while your test also is enabled for TCG
> and thus runs in the shared runners, too.
> So unless you've got a clue what's going wrong here (I fail to see the
> reason for the problem unfortunately), I'd suggest that we mark the
> virtio_balloon test with require_accelerator('kvm'), too, to get the CI
> working with the shared runners again. WDYT?

Lets do that for now

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests/functional: fix race in virtio balloon test
  2025-03-04 18:33 [PATCH] tests/functional: fix race in virtio balloon test Daniel P. Berrangé
  2025-03-05  7:58 ` Thomas Huth
  2025-03-05 12:25 ` Philippe Mathieu-Daudé
@ 2025-03-07  8:04 ` David Hildenbrand
  2025-04-02 16:27 ` Michael S. Tsirkin
  3 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2025-03-07  8:04 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel; +Cc: Michael S. Tsirkin, Thomas Huth

On 04.03.25 19:33, Daniel P. Berrangé wrote:
> There are two race conditions in the recently added virtio balloon
> test
> 
>   * The /dev/vda device node is not ready
>   * The virtio-balloon driver has not issued the first stats refresh
> 
> To fix the former, monitor dmesg for a line about 'vda'.
> 
> To fix the latter, retry the stats query until seeing fresh data.
> 
> Adding 'quiet' to the kernel command line reduces serial output
> which otherwise slows boot, making it less likely to hit the former
> race too.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests/functional: fix race in virtio balloon test
  2025-03-04 18:33 [PATCH] tests/functional: fix race in virtio balloon test Daniel P. Berrangé
                   ` (2 preceding siblings ...)
  2025-03-07  8:04 ` David Hildenbrand
@ 2025-04-02 16:27 ` Michael S. Tsirkin
  3 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2025-04-02 16:27 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, Thomas Huth, David Hildenbrand

On Tue, Mar 04, 2025 at 06:33:40PM +0000, Daniel P. Berrangé wrote:
> There are two race conditions in the recently added virtio balloon
> test
> 
>  * The /dev/vda device node is not ready
>  * The virtio-balloon driver has not issued the first stats refresh
> 
> To fix the former, monitor dmesg for a line about 'vda'.
> 
> To fix the latter, retry the stats query until seeing fresh data.
> 
> Adding 'quiet' to the kernel command line reduces serial output
> which otherwise slows boot, making it less likely to hit the former
> race too.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

ok

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  tests/functional/test_virtio_balloon.py | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/functional/test_virtio_balloon.py b/tests/functional/test_virtio_balloon.py
> index 67b48e1b4e..308d197eb3 100755
> --- a/tests/functional/test_virtio_balloon.py
> +++ b/tests/functional/test_virtio_balloon.py
> @@ -32,7 +32,7 @@ class VirtioBalloonx86(QemuSystemTest):
>          'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0')
>  
>      DEFAULT_KERNEL_PARAMS = ('root=/dev/vda1 console=ttyS0 net.ifnames=0 '
> -                             'rd.rescue')
> +                             'rd.rescue quiet')
>  
>      def wait_for_console_pattern(self, success_message, vm=None):
>          wait_for_console_pattern(
> @@ -47,6 +47,9 @@ def mount_root(self):
>          prompt = '# '
>          self.wait_for_console_pattern(prompt)
>  
> +        # Synchronize on virtio-block driver creating the root device
> +        exec_command_and_wait_for_pattern(self, "while ! (dmesg -c | grep vda:) ; do sleep 1 ; done", "vda1")
> +
>          exec_command_and_wait_for_pattern(self, 'mount /dev/vda1 /sysroot',
>                                            prompt)
>          exec_command_and_wait_for_pattern(self, 'chroot /sysroot',
> @@ -65,10 +68,21 @@ def assert_initial_stats(self):
>              assert val == UNSET_STATS_VALUE
>  
>      def assert_running_stats(self, then):
> -        ret = self.vm.qmp('qom-get',
> -                          {'path': '/machine/peripheral/balloon',
> -                           'property': 'guest-stats'})['return']
> -        when = ret.get('last-update')
> +        # We told the QEMU to refresh stats every 100ms, but
> +        # there can be a delay between virtio-ballon driver
> +        # being modprobed and seeing the first stats refresh
> +        # Retry a few times for robustness under heavy load
> +        retries = 10
> +        when = 0
> +        while when == 0 and retries:
> +            ret = self.vm.qmp('qom-get',
> +                              {'path': '/machine/peripheral/balloon',
> +                               'property': 'guest-stats'})['return']
> +            when = ret.get('last-update')
> +            if when == 0:
> +                retries = retries - 1
> +                time.sleep(0.5)
> +
>          now = time.time()
>  
>          assert when > then and when < now
> -- 
> 2.48.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-04-02 16:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-04 18:33 [PATCH] tests/functional: fix race in virtio balloon test Daniel P. Berrangé
2025-03-05  7:58 ` Thomas Huth
2025-03-05 12:25 ` Philippe Mathieu-Daudé
2025-03-06 17:42   ` Thomas Huth
2025-03-06 19:23     ` Thomas Huth
2025-03-07  8:02       ` Daniel P. Berrangé
2025-03-07  8:04 ` David Hildenbrand
2025-04-02 16:27 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).