qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* iotest40 problem
@ 2022-03-24 13:47 Li Zhang
  2022-03-25 10:17 ` Li Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Li Zhang @ 2022-03-24 13:47 UTC (permalink / raw)
  To: QEMU Developers

Hi,

When I run the testsuit on our buidling system, it reports a timeout 
sometimes not always as the following.
It couldn't connect qmp socket. Any ideas about this problem?


[ 1989s] --- 
/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
[ 1989s] +++ 040.out.bad
[ 1989s] @@ -1,5 +1,55 @@
[ 1989s] -.................................................................
[ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to establish 
connection: asyncio.exceptions.CancelledError
[ 1989s] 
+E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed 
to establish connection: asyncio.exceptions.CancelledError
[ 1989s] +E.........................
[ 1989s] 
+======================================================================
[ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage)
[ 1989s] 
+----------------------------------------------------------------------
[ 1989s] +Traceback (most recent call last):
[ 1989s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", line 
94, in setUp
[ 1989s] +    self.vm.launch()
[ 1989s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
line 399, in launch
[ 1989s] +    self._launch()
[ 1989s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
line 434, in _launch
[ 1989s] +    self._post_launch()
[ 1989s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", 
line 147, in _post_launch
[ 1989s] +    super()._post_launch()
[ 1989s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
line 340, in _post_launch
[ 1989s] +    self._qmp.accept(self._qmp_timer)
[ 1989s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
line 67, in accept
[ 1989s] +    self._sync(
[ 1989s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
line 41, in _sync
[ 1989s] +    return self._aloop.run_until_complete(
[ 1989s] +  File "/usr/lib64/python3.8/asyncio/base_events.py", line 
616, in run_until_complete
[ 1989s] +    return future.result()
[ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, in 
wait_for
[ 1989s] +    raise exceptions.TimeoutError()
[ 1989s] +asyncio.exceptions.TimeoutError


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-03-24 13:47 iotest40 problem Li Zhang
@ 2022-03-25 10:17 ` Li Zhang
  2022-03-29 16:49   ` Li Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Li Zhang @ 2022-03-25 10:17 UTC (permalink / raw)
  To: QEMU Developers, kwolf, Hanna Reitz

Hi,

I backport some iotests patches to the tree and change timeout.
It doesn't work.

Sometimes, iotest41 also reports the errors.
[ 1347s] 
+======================================================================
[ 1347s] +ERROR: test_top_node_in_wrong_chain (__main__.TestSingleDrive)
[ 1347s] 
+----------------------------------------------------------------------
[ 1347s] +Traceback (most recent call last):
[ 1347s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
line 399, in launch
[ 1347s] +    self._launch()
[ 1347s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
line 434, in _launch
[ 1347s] +    self._post_launch()
[ 1347s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", 
line 147, in _post_launch
[ 1347s] +    super()._post_launch()
[ 1347s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
line 340, in _post_launch
[ 1347s] +    self._qmp.accept(self._qmp_timer)
[ 1347s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
line 69, in accept
[ 1347s] +    timeout
[ 1347s] +  File 
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
line 42, in _sync
[ 1347s] +    asyncio.wait_for(future, timeout=timeout)
[ 1347s] +  File "/usr/lib64/python3.6/asyncio/base_events.py", line 
488, in run_until_complete
[ 1347s] +    return future.result()
[ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, in 
wait_for
[ 1347s] +    raise futures.TimeoutError()
[ 1347s] +concurrent.futures._base.TimeoutError


I can see other errors like this, it's the problem of the socket.

[ 1535s] socket_accept failed: Resource temporarily unavailable
[ 1535s] **
[ 1535s] 
ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: 
assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)


The script is running the command as this:
/usr/bin/make -O -j4 check-block V=1

I can see the errors on ppc, arm or x86.
But I couldn't reproduce it when I run it manually.

It will be appreciated if any suggestions. Thanks.


On 3/24/22 14:47, Li Zhang wrote:
> Hi,
> 
> When I run the testsuit on our buidling system, it reports a timeout 
> sometimes not always as the following.
> It couldn't connect qmp socket. Any ideas about this problem?
> 
> 
> [ 1989s] --- 
> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
> [ 1989s] +++ 040.out.bad
> [ 1989s] @@ -1,5 +1,55 @@
> [ 1989s] -.................................................................
> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to establish 
> connection: asyncio.exceptions.CancelledError
> [ 1989s] 
> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed 
> to establish connection: asyncio.exceptions.CancelledError
> [ 1989s] +E.........................
> [ 1989s] 
> +======================================================================
> [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage)
> [ 1989s] 
> +----------------------------------------------------------------------
> [ 1989s] +Traceback (most recent call last):
> [ 1989s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", line 
> 94, in setUp
> [ 1989s] +    self.vm.launch()
> [ 1989s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
> line 399, in launch
> [ 1989s] +    self._launch()
> [ 1989s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
> line 434, in _launch
> [ 1989s] +    self._post_launch()
> [ 1989s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", 
> line 147, in _post_launch
> [ 1989s] +    super()._post_launch()
> [ 1989s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
> line 340, in _post_launch
> [ 1989s] +    self._qmp.accept(self._qmp_timer)
> [ 1989s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
> line 67, in accept
> [ 1989s] +    self._sync(
> [ 1989s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
> line 41, in _sync
> [ 1989s] +    return self._aloop.run_until_complete(
> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/base_events.py", line 
> 616, in run_until_complete
> [ 1989s] +    return future.result()
> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, in 
> wait_for
> [ 1989s] +    raise exceptions.TimeoutError()
> [ 1989s] +asyncio.exceptions.TimeoutError
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-03-25 10:17 ` Li Zhang
@ 2022-03-29 16:49   ` Li Zhang
  2022-03-31  6:10     ` Hanna Reitz
  0 siblings, 1 reply; 11+ messages in thread
From: Li Zhang @ 2022-03-29 16:49 UTC (permalink / raw)
  To: QEMU Developers, kwolf, Hanna Reitz

Update what I observed.

It seems that aqmp is not stable when running test cases.
So I revert the patches as the following, iotest40/41 test cases work well.

commit 76cd358671e6b8e7c435ec65b1c44200254514a9

Author: John Snow <jsnow@redhat.com>

Date:   Tue Oct 26 13:56:12 2021 -0400



     python, iotests: replace qmp with aqmp



     Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with the 
sync

     wrapper from qemu.aqmp instead.



     Add an escape hatch in the form of the environment variable

     QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into 
using

     the old implementation, proving that both implementations work

     concurrently.



     Signed-off-by: John Snow <jsnow@redhat.com>

     Reviewed-by: Kevin Wolf <kwolf@redhat.com>

     Reviewed-by: Hanna Reitz <hreitz@redhat.com>

     Message-id: 20211026175612.4127598-9-jsnow@redhat.com

     Signed-off-by: John Snow <jsnow@redhat.com>


commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5

Author: John Snow <jsnow@redhat.com>

Date:   Thu Nov 18 15:46:18 2021 -0500



     python/machine: handle "fast" QEMU terminations



     In the case that the QEMU process actually launches -- but then dies so

     quickly that we can't establish a QMP connection to it -- QEMUMachine

     currently calls _post_shutdown() assuming that it never launched the VM

     process.



     This isn't true, though: it "merely" may have failed to establish a QMP

     connection and the process is in the middle of its own exit path.



     If we don't wait for the subprocess, the caller may get a bogus `None`

     return for .exitcode(). This behavior was observed from

     device-crash-test; after the switch to Async QMP, the timings were

     changed such that it was now seemingly possible to witness the failure

     of "vm.launch()" *prior* to the exitcode becoming available.



     The semantic of the `_launched` property is changed in this

     patch. Instead of representing the condition "launch() executed

     successfully", it will now represent "has forked a child process

     successfully". This way, wait() when called in the exit path won't

     become a no-op.



     Signed-off-by: John Snow <jsnow@redhat.com>

     Reviewed-by: Willian Rampazzo <willianr@redhat.com>

     Message-id: 20211118204620.1897674-6-jsnow@redhat.com

     Signed-off-by: John Snow <jsnow@redhat.com>





On 3/25/22 11:17, Li Zhang wrote:
> Hi,
> 
> I backport some iotests patches to the tree and change timeout.
> It doesn't work.
> 
> Sometimes, iotest41 also reports the errors.
> [ 1347s] 
> +======================================================================
> [ 1347s] +ERROR: test_top_node_in_wrong_chain (__main__.TestSingleDrive)
> [ 1347s] 
> +----------------------------------------------------------------------
> [ 1347s] +Traceback (most recent call last):
> [ 1347s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
> line 399, in launch
> [ 1347s] +    self._launch()
> [ 1347s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
> line 434, in _launch
> [ 1347s] +    self._post_launch()
> [ 1347s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", 
> line 147, in _post_launch
> [ 1347s] +    super()._post_launch()
> [ 1347s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
> line 340, in _post_launch
> [ 1347s] +    self._qmp.accept(self._qmp_timer)
> [ 1347s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
> line 69, in accept
> [ 1347s] +    timeout
> [ 1347s] +  File 
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
> line 42, in _sync
> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/base_events.py", line 
> 488, in run_until_complete
> [ 1347s] +    return future.result()
> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, in 
> wait_for
> [ 1347s] +    raise futures.TimeoutError()
> [ 1347s] +concurrent.futures._base.TimeoutError
> 
> 
> I can see other errors like this, it's the problem of the socket.
> 
> [ 1535s] socket_accept failed: Resource temporarily unavailable
> [ 1535s] **
> [ 1535s] 
> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: 
> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
> 
> 
> The script is running the command as this:
> /usr/bin/make -O -j4 check-block V=1
> 
> I can see the errors on ppc, arm or x86.
> But I couldn't reproduce it when I run it manually.
> 
> It will be appreciated if any suggestions. Thanks.
> 
> 
> On 3/24/22 14:47, Li Zhang wrote:
>> Hi,
>>
>> When I run the testsuit on our buidling system, it reports a timeout 
>> sometimes not always as the following.
>> It couldn't connect qmp socket. Any ideas about this problem?
>>
>>
>> [ 1989s] --- 
>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>> [ 1989s] +++ 040.out.bad
>> [ 1989s] @@ -1,5 +1,55 @@
>> [ 1989s] 
>> -.................................................................
>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to establish 
>> connection: asyncio.exceptions.CancelledError
>> [ 1989s] 
>> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed 
>> to establish connection: asyncio.exceptions.CancelledError
>> [ 1989s] +E.........................
>> [ 1989s] 
>> +======================================================================
>> [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage)
>> [ 1989s] 
>> +----------------------------------------------------------------------
>> [ 1989s] +Traceback (most recent call last):
>> [ 1989s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", line 
>> 94, in setUp
>> [ 1989s] +    self.vm.launch()
>> [ 1989s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>> line 399, in launch
>> [ 1989s] +    self._launch()
>> [ 1989s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>> line 434, in _launch
>> [ 1989s] +    self._post_launch()
>> [ 1989s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", 
>> line 147, in _post_launch
>> [ 1989s] +    super()._post_launch()
>> [ 1989s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>> line 340, in _post_launch
>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
>> [ 1989s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>> line 67, in accept
>> [ 1989s] +    self._sync(
>> [ 1989s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>> line 41, in _sync
>> [ 1989s] +    return self._aloop.run_until_complete(
>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/base_events.py", line 
>> 616, in run_until_complete
>> [ 1989s] +    return future.result()
>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, in 
>> wait_for
>> [ 1989s] +    raise exceptions.TimeoutError()
>> [ 1989s] +asyncio.exceptions.TimeoutError
>>
> 
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-03-29 16:49   ` Li Zhang
@ 2022-03-31  6:10     ` Hanna Reitz
  2022-03-31 10:44       ` Li Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Hanna Reitz @ 2022-03-31  6:10 UTC (permalink / raw)
  To: Li Zhang, QEMU Developers, kwolf, John Snow

On 29.03.22 18:49, Li Zhang wrote:
> Update what I observed.
>
> It seems that aqmp is not stable when running test cases.
> So I revert the patches as the following, iotest40/41 test cases work 
> well.

Thanks for bisecting.  I haven’t seen this problem before, so I didn’t 
look into it; CC-ing John, perhaps he figures something.

Hanna

> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
>
> Author: John Snow <jsnow@redhat.com>
>
> Date:   Tue Oct 26 13:56:12 2021 -0400
>
>
>
>     python, iotests: replace qmp with aqmp
>
>
>
>     Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with 
> the sync
>
>     wrapper from qemu.aqmp instead.
>
>
>
>     Add an escape hatch in the form of the environment variable
>
>     QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into 
> using
>
>     the old implementation, proving that both implementations work
>
>     concurrently.
>
>
>
>     Signed-off-by: John Snow <jsnow@redhat.com>
>
>     Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>
>     Reviewed-by: Hanna Reitz <hreitz@redhat.com>
>
>     Message-id: 20211026175612.4127598-9-jsnow@redhat.com
>
>     Signed-off-by: John Snow <jsnow@redhat.com>
>
>
> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
>
> Author: John Snow <jsnow@redhat.com>
>
> Date:   Thu Nov 18 15:46:18 2021 -0500
>
>
>
>     python/machine: handle "fast" QEMU terminations
>
>
>
>     In the case that the QEMU process actually launches -- but then 
> dies so
>
>     quickly that we can't establish a QMP connection to it -- QEMUMachine
>
>     currently calls _post_shutdown() assuming that it never launched 
> the VM
>
>     process.
>
>
>
>     This isn't true, though: it "merely" may have failed to establish 
> a QMP
>
>     connection and the process is in the middle of its own exit path.
>
>
>
>     If we don't wait for the subprocess, the caller may get a bogus 
> `None`
>
>     return for .exitcode(). This behavior was observed from
>
>     device-crash-test; after the switch to Async QMP, the timings were
>
>     changed such that it was now seemingly possible to witness the 
> failure
>
>     of "vm.launch()" *prior* to the exitcode becoming available.
>
>
>
>     The semantic of the `_launched` property is changed in this
>
>     patch. Instead of representing the condition "launch() executed
>
>     successfully", it will now represent "has forked a child process
>
>     successfully". This way, wait() when called in the exit path won't
>
>     become a no-op.
>
>
>
>     Signed-off-by: John Snow <jsnow@redhat.com>
>
>     Reviewed-by: Willian Rampazzo <willianr@redhat.com>
>
>     Message-id: 20211118204620.1897674-6-jsnow@redhat.com
>
>     Signed-off-by: John Snow <jsnow@redhat.com>
>
>
>
>
>
> On 3/25/22 11:17, Li Zhang wrote:
>> Hi,
>>
>> I backport some iotests patches to the tree and change timeout.
>> It doesn't work.
>>
>> Sometimes, iotest41 also reports the errors.
>> [ 1347s] 
>> +======================================================================
>> [ 1347s] +ERROR: test_top_node_in_wrong_chain (__main__.TestSingleDrive)
>> [ 1347s] 
>> +----------------------------------------------------------------------
>> [ 1347s] +Traceback (most recent call last):
>> [ 1347s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>> line 399, in launch
>> [ 1347s] +    self._launch()
>> [ 1347s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>> line 434, in _launch
>> [ 1347s] +    self._post_launch()
>> [ 1347s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", 
>> line 147, in _post_launch
>> [ 1347s] +    super()._post_launch()
>> [ 1347s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>> line 340, in _post_launch
>> [ 1347s] +    self._qmp.accept(self._qmp_timer)
>> [ 1347s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>> line 69, in accept
>> [ 1347s] +    timeout
>> [ 1347s] +  File 
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>> line 42, in _sync
>> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/base_events.py", line 
>> 488, in run_until_complete
>> [ 1347s] +    return future.result()
>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, 
>> in wait_for
>> [ 1347s] +    raise futures.TimeoutError()
>> [ 1347s] +concurrent.futures._base.TimeoutError
>>
>>
>> I can see other errors like this, it's the problem of the socket.
>>
>> [ 1535s] socket_accept failed: Resource temporarily unavailable
>> [ 1535s] **
>> [ 1535s] 
>> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: 
>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
>>
>>
>> The script is running the command as this:
>> /usr/bin/make -O -j4 check-block V=1
>>
>> I can see the errors on ppc, arm or x86.
>> But I couldn't reproduce it when I run it manually.
>>
>> It will be appreciated if any suggestions. Thanks.
>>
>>
>> On 3/24/22 14:47, Li Zhang wrote:
>>> Hi,
>>>
>>> When I run the testsuit on our buidling system, it reports a timeout 
>>> sometimes not always as the following.
>>> It couldn't connect qmp socket. Any ideas about this problem?
>>>
>>>
>>> [ 1989s] --- 
>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>>> [ 1989s] +++ 040.out.bad
>>> [ 1989s] @@ -1,5 +1,55 @@
>>> [ 1989s] 
>>> -.................................................................
>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to 
>>> establish connection: asyncio.exceptions.CancelledError
>>> [ 1989s] 
>>> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed 
>>> to establish connection: asyncio.exceptions.CancelledError
>>> [ 1989s] +E.........................
>>> [ 1989s] 
>>> +======================================================================
>>> [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage)
>>> [ 1989s] 
>>> +----------------------------------------------------------------------
>>> [ 1989s] +Traceback (most recent call last):
>>> [ 1989s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", 
>>> line 94, in setUp
>>> [ 1989s] +    self.vm.launch()
>>> [ 1989s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>>> line 399, in launch
>>> [ 1989s] +    self._launch()
>>> [ 1989s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>>> line 434, in _launch
>>> [ 1989s] +    self._post_launch()
>>> [ 1989s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", 
>>> line 147, in _post_launch
>>> [ 1989s] +    super()._post_launch()
>>> [ 1989s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>>> line 340, in _post_launch
>>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
>>> [ 1989s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>>> line 67, in accept
>>> [ 1989s] +    self._sync(
>>> [ 1989s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>>> line 41, in _sync
>>> [ 1989s] +    return self._aloop.run_until_complete(
>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/base_events.py", line 
>>> 616, in run_until_complete
>>> [ 1989s] +    return future.result()
>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, 
>>> in wait_for
>>> [ 1989s] +    raise exceptions.TimeoutError()
>>> [ 1989s] +asyncio.exceptions.TimeoutError
>>>
>>
>>
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-03-31  6:10     ` Hanna Reitz
@ 2022-03-31 10:44       ` Li Zhang
  2022-03-31 14:37         ` John Snow
  0 siblings, 1 reply; 11+ messages in thread
From: Li Zhang @ 2022-03-31 10:44 UTC (permalink / raw)
  To: Hanna Reitz, QEMU Developers, kwolf, John Snow

On 3/31/22 08:10, Hanna Reitz wrote:
> On 29.03.22 18:49, Li Zhang wrote:
>> Update what I observed.
>>
>> It seems that aqmp is not stable when running test cases.
>> So I revert the patches as the following, iotest40/41 test cases work 
>> well.
> 
> Thanks for bisecting.  I haven’t seen this problem before, so I didn’t 
> look into it; CC-ing John, perhaps he figures something.
> 
Thanks a lot.
It happens occassionally. I didn't find a way to reproduce it manually.

> Hanna
> 
>> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
>>
>> Author: John Snow <jsnow@redhat.com>
>>
>> Date:   Tue Oct 26 13:56:12 2021 -0400
>>
>>
>>
>>     python, iotests: replace qmp with aqmp
>>
>>
>>
>>     Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with 
>> the sync
>>
>>     wrapper from qemu.aqmp instead.
>>
>>
>>
>>     Add an escape hatch in the form of the environment variable
>>
>>     QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into 
>> using
>>
>>     the old implementation, proving that both implementations work
>>
>>     concurrently.
>>
>>
>>
>>     Signed-off-by: John Snow <jsnow@redhat.com>
>>
>>     Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>>
>>     Reviewed-by: Hanna Reitz <hreitz@redhat.com>
>>
>>     Message-id: 20211026175612.4127598-9-jsnow@redhat.com
>>
>>     Signed-off-by: John Snow <jsnow@redhat.com>
>>
>>
>> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
>>
>> Author: John Snow <jsnow@redhat.com>
>>
>> Date:   Thu Nov 18 15:46:18 2021 -0500
>>
>>
>>
>>     python/machine: handle "fast" QEMU terminations
>>
>>
>>
>>     In the case that the QEMU process actually launches -- but then 
>> dies so
>>
>>     quickly that we can't establish a QMP connection to it -- QEMUMachine
>>
>>     currently calls _post_shutdown() assuming that it never launched 
>> the VM
>>
>>     process.
>>
>>
>>
>>     This isn't true, though: it "merely" may have failed to establish 
>> a QMP
>>
>>     connection and the process is in the middle of its own exit path.
>>
>>
>>
>>     If we don't wait for the subprocess, the caller may get a bogus 
>> `None`
>>
>>     return for .exitcode(). This behavior was observed from
>>
>>     device-crash-test; after the switch to Async QMP, the timings were
>>
>>     changed such that it was now seemingly possible to witness the 
>> failure
>>
>>     of "vm.launch()" *prior* to the exitcode becoming available.
>>
>>
>>
>>     The semantic of the `_launched` property is changed in this
>>
>>     patch. Instead of representing the condition "launch() executed
>>
>>     successfully", it will now represent "has forked a child process
>>
>>     successfully". This way, wait() when called in the exit path won't
>>
>>     become a no-op.
>>
>>
>>
>>     Signed-off-by: John Snow <jsnow@redhat.com>
>>
>>     Reviewed-by: Willian Rampazzo <willianr@redhat.com>
>>
>>     Message-id: 20211118204620.1897674-6-jsnow@redhat.com
>>
>>     Signed-off-by: John Snow <jsnow@redhat.com>
>>
>>
>>
>>
>>
>> On 3/25/22 11:17, Li Zhang wrote:
>>> Hi,
>>>
>>> I backport some iotests patches to the tree and change timeout.
>>> It doesn't work.
>>>
>>> Sometimes, iotest41 also reports the errors.
>>> [ 1347s] 
>>> +======================================================================
>>> [ 1347s] +ERROR: test_top_node_in_wrong_chain (__main__.TestSingleDrive)
>>> [ 1347s] 
>>> +----------------------------------------------------------------------
>>> [ 1347s] +Traceback (most recent call last):
>>> [ 1347s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>>> line 399, in launch
>>> [ 1347s] +    self._launch()
>>> [ 1347s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>>> line 434, in _launch
>>> [ 1347s] +    self._post_launch()
>>> [ 1347s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", line 
>>> 147, in _post_launch
>>> [ 1347s] +    super()._post_launch()
>>> [ 1347s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>>> line 340, in _post_launch
>>> [ 1347s] +    self._qmp.accept(self._qmp_timer)
>>> [ 1347s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>>> line 69, in accept
>>> [ 1347s] +    timeout
>>> [ 1347s] +  File 
>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>>> line 42, in _sync
>>> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
>>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/base_events.py", line 
>>> 488, in run_until_complete
>>> [ 1347s] +    return future.result()
>>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, 
>>> in wait_for
>>> [ 1347s] +    raise futures.TimeoutError()
>>> [ 1347s] +concurrent.futures._base.TimeoutError
>>>
>>>
>>> I can see other errors like this, it's the problem of the socket.
>>>
>>> [ 1535s] socket_accept failed: Resource temporarily unavailable
>>> [ 1535s] **
>>> [ 1535s] 
>>> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: 
>>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
>>>
>>>
>>> The script is running the command as this:
>>> /usr/bin/make -O -j4 check-block V=1
>>>
>>> I can see the errors on ppc, arm or x86.
>>> But I couldn't reproduce it when I run it manually.
>>>
>>> It will be appreciated if any suggestions. Thanks.
>>>
>>>
>>> On 3/24/22 14:47, Li Zhang wrote:
>>>> Hi,
>>>>
>>>> When I run the testsuit on our buidling system, it reports a timeout 
>>>> sometimes not always as the following.
>>>> It couldn't connect qmp socket. Any ideas about this problem?
>>>>
>>>>
>>>> [ 1989s] --- 
>>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>>>> [ 1989s] +++ 040.out.bad
>>>> [ 1989s] @@ -1,5 +1,55 @@
>>>> [ 1989s] 
>>>> -.................................................................
>>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to 
>>>> establish connection: asyncio.exceptions.CancelledError
>>>> [ 1989s] 
>>>> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed 
>>>> to establish connection: asyncio.exceptions.CancelledError
>>>> [ 1989s] +E.........................
>>>> [ 1989s] 
>>>> +======================================================================
>>>> [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage)
>>>> [ 1989s] 
>>>> +----------------------------------------------------------------------
>>>> [ 1989s] +Traceback (most recent call last):
>>>> [ 1989s] +  File 
>>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", 
>>>> line 94, in setUp
>>>> [ 1989s] +    self.vm.launch()
>>>> [ 1989s] +  File 
>>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>>>> line 399, in launch
>>>> [ 1989s] +    self._launch()
>>>> [ 1989s] +  File 
>>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>>>> line 434, in _launch
>>>> [ 1989s] +    self._post_launch()
>>>> [ 1989s] +  File 
>>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", 
>>>> line 147, in _post_launch
>>>> [ 1989s] +    super()._post_launch()
>>>> [ 1989s] +  File 
>>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", 
>>>> line 340, in _post_launch
>>>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
>>>> [ 1989s] +  File 
>>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>>>> line 67, in accept
>>>> [ 1989s] +    self._sync(
>>>> [ 1989s] +  File 
>>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", 
>>>> line 41, in _sync
>>>> [ 1989s] +    return self._aloop.run_until_complete(
>>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/base_events.py", line 
>>>> 616, in run_until_complete
>>>> [ 1989s] +    return future.result()
>>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, 
>>>> in wait_for
>>>> [ 1989s] +    raise exceptions.TimeoutError()
>>>> [ 1989s] +asyncio.exceptions.TimeoutError
>>>>
>>>
>>>
>>
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-03-31 10:44       ` Li Zhang
@ 2022-03-31 14:37         ` John Snow
  2022-03-31 14:46           ` John Snow
  2022-03-31 15:16           ` Li Zhang
  0 siblings, 2 replies; 11+ messages in thread
From: John Snow @ 2022-03-31 14:37 UTC (permalink / raw)
  To: Li Zhang; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 9146 bytes --]

On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de> wrote:

> On 3/31/22 08:10, Hanna Reitz wrote:
> > On 29.03.22 18:49, Li Zhang wrote:
> >> Update what I observed.
> >>
> >> It seems that aqmp is not stable when running test cases.
> >> So I revert the patches as the following, iotest40/41 test cases work
> >> well.
> >
> > Thanks for bisecting.  I haven’t seen this problem before, so I didn’t
> > look into it; CC-ing John, perhaps he figures something.
> >
> Thanks a lot.
> It happens occassionally. I didn't find a way to reproduce it manually.
>

Sorry for the instability.

A few questions then:

- what operating system and version
- kernel version
- architecture (looks like x86 and others?)
- what commit/version of QEMU are you testing?
- what python version are you using to run iotests?



> > Hanna
> >
> >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
> >>
> >> Author: John Snow <jsnow@redhat.com>
> >>
> >> Date:   Tue Oct 26 13:56:12 2021 -0400
> >>
> >>
> >>
> >>     python, iotests: replace qmp with aqmp
> >>
> >>
> >>
> >>     Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with
> >> the sync
> >>
> >>     wrapper from qemu.aqmp instead.
> >>
> >>
> >>
> >>     Add an escape hatch in the form of the environment variable
> >>
> >>     QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into
> >> using
> >>
> >>     the old implementation, proving that both implementations work
> >>
> >>     concurrently.
> >>
> >>
> >>
> >>     Signed-off-by: John Snow <jsnow@redhat.com>
> >>
> >>     Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> >>
> >>     Reviewed-by: Hanna Reitz <hreitz@redhat.com>
> >>
> >>     Message-id: 20211026175612.4127598-9-jsnow@redhat.com
> >>
> >>     Signed-off-by: John Snow <jsnow@redhat.com>
> >>
> >>
> >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
> >>
> >> Author: John Snow <jsnow@redhat.com>
> >>
> >> Date:   Thu Nov 18 15:46:18 2021 -0500
> >>
> >>
> >>
> >>     python/machine: handle "fast" QEMU terminations
> >>
> >>
> >>
> >>     In the case that the QEMU process actually launches -- but then
> >> dies so
> >>
> >>     quickly that we can't establish a QMP connection to it --
> QEMUMachine
> >>
> >>     currently calls _post_shutdown() assuming that it never launched
> >> the VM
> >>
> >>     process.
> >>
> >>
> >>
> >>     This isn't true, though: it "merely" may have failed to establish
> >> a QMP
> >>
> >>     connection and the process is in the middle of its own exit path.
> >>
> >>
> >>
> >>     If we don't wait for the subprocess, the caller may get a bogus
> >> `None`
> >>
> >>     return for .exitcode(). This behavior was observed from
> >>
> >>     device-crash-test; after the switch to Async QMP, the timings were
> >>
> >>     changed such that it was now seemingly possible to witness the
> >> failure
> >>
> >>     of "vm.launch()" *prior* to the exitcode becoming available.
> >>
> >>
> >>
> >>     The semantic of the `_launched` property is changed in this
> >>
> >>     patch. Instead of representing the condition "launch() executed
> >>
> >>     successfully", it will now represent "has forked a child process
> >>
> >>     successfully". This way, wait() when called in the exit path won't
> >>
> >>     become a no-op.
> >>
> >>
> >>
> >>     Signed-off-by: John Snow <jsnow@redhat.com>
> >>
> >>     Reviewed-by: Willian Rampazzo <willianr@redhat.com>
> >>
> >>     Message-id: 20211118204620.1897674-6-jsnow@redhat.com
> >>
> >>     Signed-off-by: John Snow <jsnow@redhat.com>
> >>
> >>
> >>
> >>
> >>
> >> On 3/25/22 11:17, Li Zhang wrote:
> >>> Hi,
> >>>
> >>> I backport some iotests patches to the tree and change timeout.
> >>> It doesn't work.
> >>>
> >>> Sometimes, iotest41 also reports the errors.
> >>> [ 1347s]
> >>> +======================================================================
> >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain
> (__main__.TestSingleDrive)
> >>> [ 1347s]
> >>> +----------------------------------------------------------------------
> >>> [ 1347s] +Traceback (most recent call last):
> >>> [ 1347s] +  File
> >>>
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> >>> line 399, in launch
> >>> [ 1347s] +    self._launch()
> >>> [ 1347s] +  File
> >>>
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> >>> line 434, in _launch
> >>> [ 1347s] +    self._post_launch()
> >>> [ 1347s] +  File
> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
> line
> >>> 147, in _post_launch
> >>> [ 1347s] +    super()._post_launch()
> >>> [ 1347s] +  File
> >>>
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> >>> line 340, in _post_launch
> >>> [ 1347s] +    self._qmp.accept(self._qmp_timer)
> >>> [ 1347s] +  File
> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> >>> line 69, in accept
> >>> [ 1347s] +    timeout
> >>> [ 1347s] +  File
> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> >>> line 42, in _sync
> >>> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
> >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/base_events.py", line
> >>> 488, in run_until_complete
> >>> [ 1347s] +    return future.result()
> >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py", line 362,
> >>> in wait_for
> >>> [ 1347s] +    raise futures.TimeoutError()
> >>> [ 1347s] +concurrent.futures._base.TimeoutError
> >>>
> >>>
> >>> I can see other errors like this, it's the problem of the socket.
> >>>
> >>> [ 1535s] socket_accept failed: Resource temporarily unavailable
> >>> [ 1535s] **
> >>> [ 1535s]
> >>> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake:
> >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
> >>>
> >>>
> >>> The script is running the command as this:
> >>> /usr/bin/make -O -j4 check-block V=1
> >>>
> >>> I can see the errors on ppc, arm or x86.
> >>> But I couldn't reproduce it when I run it manually.
> >>>
> >>> It will be appreciated if any suggestions. Thanks.
> >>>
> >>>
> >>> On 3/24/22 14:47, Li Zhang wrote:
> >>>> Hi,
> >>>>
> >>>> When I run the testsuit on our buidling system, it reports a timeout
> >>>> sometimes not always as the following.
> >>>> It couldn't connect qmp socket. Any ideas about this problem?
> >>>>
> >>>>
> >>>> [ 1989s] ---
> >>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
> >>>> [ 1989s] +++ 040.out.bad
> >>>> [ 1989s] @@ -1,5 +1,55 @@
> >>>> [ 1989s]
> >>>> -.................................................................
> >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to
> >>>> establish connection: asyncio.exceptions.CancelledError
> >>>> [ 1989s]
> >>>>
> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed
>
> >>>> to establish connection: asyncio.exceptions.CancelledError
> >>>> [ 1989s] +E.........................
> >>>> [ 1989s]
> >>>>
> +======================================================================
> >>>> [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage)
> >>>> [ 1989s]
> >>>>
> +----------------------------------------------------------------------
> >>>> [ 1989s] +Traceback (most recent call last):
> >>>> [ 1989s] +  File
> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040",
> >>>> line 94, in setUp
> >>>> [ 1989s] +    self.vm.launch()
> >>>> [ 1989s] +  File
> >>>>
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> >>>> line 399, in launch
> >>>> [ 1989s] +    self._launch()
> >>>> [ 1989s] +  File
> >>>>
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> >>>> line 434, in _launch
> >>>> [ 1989s] +    self._post_launch()
> >>>> [ 1989s] +  File
> >>>>
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
> >>>> line 147, in _post_launch
> >>>> [ 1989s] +    super()._post_launch()
> >>>> [ 1989s] +  File
> >>>>
> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> >>>> line 340, in _post_launch
> >>>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
> >>>> [ 1989s] +  File
> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> >>>> line 67, in accept
> >>>> [ 1989s] +    self._sync(
> >>>> [ 1989s] +  File
> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> >>>> line 41, in _sync
> >>>> [ 1989s] +    return self._aloop.run_until_complete(
> >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/base_events.py", line
> >>>> 616, in run_until_complete
> >>>> [ 1989s] +    return future.result()
> >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py", line 501,
> >>>> in wait_for
> >>>> [ 1989s] +    raise exceptions.TimeoutError()
> >>>> [ 1989s] +asyncio.exceptions.TimeoutError
> >>>>
> >>>
> >>>
> >>
> >
>
>
>

[-- Attachment #2: Type: text/html, Size: 13918 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-03-31 14:37         ` John Snow
@ 2022-03-31 14:46           ` John Snow
  2022-03-31 15:18             ` Li Zhang
  2022-04-06 14:48             ` Li Zhang
  2022-03-31 15:16           ` Li Zhang
  1 sibling, 2 replies; 11+ messages in thread
From: John Snow @ 2022-03-31 14:46 UTC (permalink / raw)
  To: Li Zhang; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 9972 bytes --]

On Thu, Mar 31, 2022, 10:37 AM John Snow <jsnow@redhat.com> wrote:

>
>
> On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de> wrote:
>
>> On 3/31/22 08:10, Hanna Reitz wrote:
>> > On 29.03.22 18:49, Li Zhang wrote:
>> >> Update what I observed.
>> >>
>> >> It seems that aqmp is not stable when running test cases.
>> >> So I revert the patches as the following, iotest40/41 test cases work
>> >> well.
>> >
>> > Thanks for bisecting.  I haven’t seen this problem before, so I didn’t
>> > look into it; CC-ing John, perhaps he figures something.
>> >
>> Thanks a lot.
>> It happens occassionally. I didn't find a way to reproduce it manually.
>>
>
> Sorry for the instability.
>
> A few questions then:
>
> - what operating system and version
> - kernel version
> - architecture (looks like x86 and others?)
> - what commit/version of QEMU are you testing?
> - what python version are you using to run iotests?
>

I'll also point out these patchsets, because we're in different timezones
and if I'm lucky it will be the answer:

[1] https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/

[2] https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/

#1 wasn't sufficient to fix the problem, but I had hoped #2 was. The second
patchset was merged somewhat recently, just before rc testing for 7.0.

Thanks,
--js



>
>
>> > Hanna
>> >
>> >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
>> >>
>> >> Author: John Snow <jsnow@redhat.com>
>> >>
>> >> Date:   Tue Oct 26 13:56:12 2021 -0400
>> >>
>> >>
>> >>
>> >>     python, iotests: replace qmp with aqmp
>> >>
>> >>
>> >>
>> >>     Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with
>> >> the sync
>> >>
>> >>     wrapper from qemu.aqmp instead.
>> >>
>> >>
>> >>
>> >>     Add an escape hatch in the form of the environment variable
>> >>
>> >>     QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into
>> >> using
>> >>
>> >>     the old implementation, proving that both implementations work
>> >>
>> >>     concurrently.
>> >>
>> >>
>> >>
>> >>     Signed-off-by: John Snow <jsnow@redhat.com>
>> >>
>> >>     Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>> >>
>> >>     Reviewed-by: Hanna Reitz <hreitz@redhat.com>
>> >>
>> >>     Message-id: 20211026175612.4127598-9-jsnow@redhat.com
>> >>
>> >>     Signed-off-by: John Snow <jsnow@redhat.com>
>> >>
>> >>
>> >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
>> >>
>> >> Author: John Snow <jsnow@redhat.com>
>> >>
>> >> Date:   Thu Nov 18 15:46:18 2021 -0500
>> >>
>> >>
>> >>
>> >>     python/machine: handle "fast" QEMU terminations
>> >>
>> >>
>> >>
>> >>     In the case that the QEMU process actually launches -- but then
>> >> dies so
>> >>
>> >>     quickly that we can't establish a QMP connection to it --
>> QEMUMachine
>> >>
>> >>     currently calls _post_shutdown() assuming that it never launched
>> >> the VM
>> >>
>> >>     process.
>> >>
>> >>
>> >>
>> >>     This isn't true, though: it "merely" may have failed to establish
>> >> a QMP
>> >>
>> >>     connection and the process is in the middle of its own exit path.
>> >>
>> >>
>> >>
>> >>     If we don't wait for the subprocess, the caller may get a bogus
>> >> `None`
>> >>
>> >>     return for .exitcode(). This behavior was observed from
>> >>
>> >>     device-crash-test; after the switch to Async QMP, the timings were
>> >>
>> >>     changed such that it was now seemingly possible to witness the
>> >> failure
>> >>
>> >>     of "vm.launch()" *prior* to the exitcode becoming available.
>> >>
>> >>
>> >>
>> >>     The semantic of the `_launched` property is changed in this
>> >>
>> >>     patch. Instead of representing the condition "launch() executed
>> >>
>> >>     successfully", it will now represent "has forked a child process
>> >>
>> >>     successfully". This way, wait() when called in the exit path won't
>> >>
>> >>     become a no-op.
>> >>
>> >>
>> >>
>> >>     Signed-off-by: John Snow <jsnow@redhat.com>
>> >>
>> >>     Reviewed-by: Willian Rampazzo <willianr@redhat.com>
>> >>
>> >>     Message-id: 20211118204620.1897674-6-jsnow@redhat.com
>> >>
>> >>     Signed-off-by: John Snow <jsnow@redhat.com>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/25/22 11:17, Li Zhang wrote:
>> >>> Hi,
>> >>>
>> >>> I backport some iotests patches to the tree and change timeout.
>> >>> It doesn't work.
>> >>>
>> >>> Sometimes, iotest41 also reports the errors.
>> >>> [ 1347s]
>> >>>
>> +======================================================================
>> >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain
>> (__main__.TestSingleDrive)
>> >>> [ 1347s]
>> >>>
>> +----------------------------------------------------------------------
>> >>> [ 1347s] +Traceback (most recent call last):
>> >>> [ 1347s] +  File
>> >>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>> line 399, in launch
>> >>> [ 1347s] +    self._launch()
>> >>> [ 1347s] +  File
>> >>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>> line 434, in _launch
>> >>> [ 1347s] +    self._post_launch()
>> >>> [ 1347s] +  File
>> >>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", line
>> >>> 147, in _post_launch
>> >>> [ 1347s] +    super()._post_launch()
>> >>> [ 1347s] +  File
>> >>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>> line 340, in _post_launch
>> >>> [ 1347s] +    self._qmp.accept(self._qmp_timer)
>> >>> [ 1347s] +  File
>> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>> >>> line 69, in accept
>> >>> [ 1347s] +    timeout
>> >>> [ 1347s] +  File
>> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>> >>> line 42, in _sync
>> >>> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
>> >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/base_events.py", line
>> >>> 488, in run_until_complete
>> >>> [ 1347s] +    return future.result()
>> >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py", line 362,
>> >>> in wait_for
>> >>> [ 1347s] +    raise futures.TimeoutError()
>> >>> [ 1347s] +concurrent.futures._base.TimeoutError
>> >>>
>> >>>
>> >>> I can see other errors like this, it's the problem of the socket.
>> >>>
>> >>> [ 1535s] socket_accept failed: Resource temporarily unavailable
>> >>> [ 1535s] **
>> >>> [ 1535s]
>> >>> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake:
>> >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
>> >>>
>> >>>
>> >>> The script is running the command as this:
>> >>> /usr/bin/make -O -j4 check-block V=1
>> >>>
>> >>> I can see the errors on ppc, arm or x86.
>> >>> But I couldn't reproduce it when I run it manually.
>> >>>
>> >>> It will be appreciated if any suggestions. Thanks.
>> >>>
>> >>>
>> >>> On 3/24/22 14:47, Li Zhang wrote:
>> >>>> Hi,
>> >>>>
>> >>>> When I run the testsuit on our buidling system, it reports a timeout
>> >>>> sometimes not always as the following.
>> >>>> It couldn't connect qmp socket. Any ideas about this problem?
>> >>>>
>> >>>>
>> >>>> [ 1989s] ---
>> >>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>> >>>> [ 1989s] +++ 040.out.bad
>> >>>> [ 1989s] @@ -1,5 +1,55 @@
>> >>>> [ 1989s]
>> >>>> -.................................................................
>> >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to
>> >>>> establish connection: asyncio.exceptions.CancelledError
>> >>>> [ 1989s]
>> >>>>
>> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed
>>
>> >>>> to establish connection: asyncio.exceptions.CancelledError
>> >>>> [ 1989s] +E.........................
>> >>>> [ 1989s]
>> >>>>
>> +======================================================================
>> >>>> [ 1989s] +ERROR: test_commit_node
>> (__main__.TestActiveZeroLengthImage)
>> >>>> [ 1989s]
>> >>>>
>> +----------------------------------------------------------------------
>> >>>> [ 1989s] +Traceback (most recent call last):
>> >>>> [ 1989s] +  File
>> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040",
>> >>>> line 94, in setUp
>> >>>> [ 1989s] +    self.vm.launch()
>> >>>> [ 1989s] +  File
>> >>>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>>> line 399, in launch
>> >>>> [ 1989s] +    self._launch()
>> >>>> [ 1989s] +  File
>> >>>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>>> line 434, in _launch
>> >>>> [ 1989s] +    self._post_launch()
>> >>>> [ 1989s] +  File
>> >>>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
>> >>>> line 147, in _post_launch
>> >>>> [ 1989s] +    super()._post_launch()
>> >>>> [ 1989s] +  File
>> >>>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>>> line 340, in _post_launch
>> >>>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
>> >>>> [ 1989s] +  File
>> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>> >>>> line 67, in accept
>> >>>> [ 1989s] +    self._sync(
>> >>>> [ 1989s] +  File
>> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>> >>>> line 41, in _sync
>> >>>> [ 1989s] +    return self._aloop.run_until_complete(
>> >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/base_events.py", line
>> >>>> 616, in run_until_complete
>> >>>> [ 1989s] +    return future.result()
>> >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py", line 501,
>> >>>> in wait_for
>> >>>> [ 1989s] +    raise exceptions.TimeoutError()
>> >>>> [ 1989s] +asyncio.exceptions.TimeoutError
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>

[-- Attachment #2: Type: text/html, Size: 15491 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-03-31 14:37         ` John Snow
  2022-03-31 14:46           ` John Snow
@ 2022-03-31 15:16           ` Li Zhang
  1 sibling, 0 replies; 11+ messages in thread
From: Li Zhang @ 2022-03-31 15:16 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers

On 3/31/22 16:37, John Snow wrote:
> 
> 
> On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de 
> <mailto:lizhang@suse.de>> wrote:
> 
>     On 3/31/22 08:10, Hanna Reitz wrote:
>      > On 29.03.22 18:49, Li Zhang wrote:
>      >> Update what I observed.
>      >>
>      >> It seems that aqmp is not stable when running test cases.
>      >> So I revert the patches as the following, iotest40/41 test cases
>     work
>      >> well.
>      >
>      > Thanks for bisecting.  I haven’t seen this problem before, so I
>     didn’t
>      > look into it; CC-ing John, perhaps he figures something.
>      >
>     Thanks a lot.
>     It happens occassionally. I didn't find a way to reproduce it manually.
> 
> 
> Sorry for the instability.
> 
> A few questions then:
> 
> - what operating system and version
> - kernel version
> - architecture (looks like x86 and others?)
> - what commit/version of QEMU are you testing?
> - what python version are you using to run iotests?
> 
Hi John,

Thanks for your reply, here is the answer.

- openSUSE15.3, openSUSE15.4
- Kernel version:
   openSUSE15.3: kernel-obs-build-5.3.18-150300.59.60.4
   openSUSE15.4: kernel-obs-build-5.14.21-150400.15.2
- architecture: X86, aarch64, s390
- QEMU version is based on qemu6.2.0, I also backport iotest patches 
after QEMU6.2.0
- python:
   openSUSE 15.3:  python3-base-3.6.15-150300.10.21.1
   openSuse15.4:  python3-base-3.6.15-150300.10.21.1

This issue happens on S390, aarch64 a lot.



> 
> 
>      > Hanna
>      >
>      >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
>      >>
>      >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>>
>      >>
>      >> Date:   Tue Oct 26 13:56:12 2021 -0400
>      >>
>      >>
>      >>
>      >>     python, iotests: replace qmp with aqmp
>      >>
>      >>
>      >>
>      >>     Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with
>      >> the sync
>      >>
>      >>     wrapper from qemu.aqmp instead.
>      >>
>      >>
>      >>
>      >>     Add an escape hatch in the form of the environment variable
>      >>
>      >>     QEMU_PYTHON_LEGACY_QMP which allows you to cajole
>     QEMUMachine into
>      >> using
>      >>
>      >>     the old implementation, proving that both implementations work
>      >>
>      >>     concurrently.
>      >>
>      >>
>      >>
>      >>     Signed-off-by: John Snow <jsnow@redhat.com
>     <mailto:jsnow@redhat.com>>
>      >>
>      >>     Reviewed-by: Kevin Wolf <kwolf@redhat.com
>     <mailto:kwolf@redhat.com>>
>      >>
>      >>     Reviewed-by: Hanna Reitz <hreitz@redhat.com
>     <mailto:hreitz@redhat.com>>
>      >>
>      >>     Message-id: 20211026175612.4127598-9-jsnow@redhat.com
>     <mailto:20211026175612.4127598-9-jsnow@redhat.com>
>      >>
>      >>     Signed-off-by: John Snow <jsnow@redhat.com
>     <mailto:jsnow@redhat.com>>
>      >>
>      >>
>      >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
>      >>
>      >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>>
>      >>
>      >> Date:   Thu Nov 18 15:46:18 2021 -0500
>      >>
>      >>
>      >>
>      >>     python/machine: handle "fast" QEMU terminations
>      >>
>      >>
>      >>
>      >>     In the case that the QEMU process actually launches -- but then
>      >> dies so
>      >>
>      >>     quickly that we can't establish a QMP connection to it --
>     QEMUMachine
>      >>
>      >>     currently calls _post_shutdown() assuming that it never
>     launched
>      >> the VM
>      >>
>      >>     process.
>      >>
>      >>
>      >>
>      >>     This isn't true, though: it "merely" may have failed to
>     establish
>      >> a QMP
>      >>
>      >>     connection and the process is in the middle of its own exit
>     path.
>      >>
>      >>
>      >>
>      >>     If we don't wait for the subprocess, the caller may get a bogus
>      >> `None`
>      >>
>      >>     return for .exitcode(). This behavior was observed from
>      >>
>      >>     device-crash-test; after the switch to Async QMP, the
>     timings were
>      >>
>      >>     changed such that it was now seemingly possible to witness the
>      >> failure
>      >>
>      >>     of "vm.launch()" *prior* to the exitcode becoming available.
>      >>
>      >>
>      >>
>      >>     The semantic of the `_launched` property is changed in this
>      >>
>      >>     patch. Instead of representing the condition "launch() executed
>      >>
>      >>     successfully", it will now represent "has forked a child process
>      >>
>      >>     successfully". This way, wait() when called in the exit path
>     won't
>      >>
>      >>     become a no-op.
>      >>
>      >>
>      >>
>      >>     Signed-off-by: John Snow <jsnow@redhat.com
>     <mailto:jsnow@redhat.com>>
>      >>
>      >>     Reviewed-by: Willian Rampazzo <willianr@redhat.com
>     <mailto:willianr@redhat.com>>
>      >>
>      >>     Message-id: 20211118204620.1897674-6-jsnow@redhat.com
>     <mailto:20211118204620.1897674-6-jsnow@redhat.com>
>      >>
>      >>     Signed-off-by: John Snow <jsnow@redhat.com
>     <mailto:jsnow@redhat.com>>
>      >>
>      >>
>      >>
>      >>
>      >>
>      >> On 3/25/22 11:17, Li Zhang wrote:
>      >>> Hi,
>      >>>
>      >>> I backport some iotests patches to the tree and change timeout.
>      >>> It doesn't work.
>      >>>
>      >>> Sometimes, iotest41 also reports the errors.
>      >>> [ 1347s]
>      >>>
>     +======================================================================
>      >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain
>     (__main__.TestSingleDrive)
>      >>> [ 1347s]
>      >>>
>     +----------------------------------------------------------------------
>      >>> [ 1347s] +Traceback (most recent call last):
>      >>> [ 1347s] +  File
>      >>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>      >>> line 399, in launch
>      >>> [ 1347s] +    self._launch()
>      >>> [ 1347s] +  File
>      >>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>      >>> line 434, in _launch
>      >>> [ 1347s] +    self._post_launch()
>      >>> [ 1347s] +  File
>      >>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
>     line
>      >>> 147, in _post_launch
>      >>> [ 1347s] +    super()._post_launch()
>      >>> [ 1347s] +  File
>      >>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>      >>> line 340, in _post_launch
>      >>> [ 1347s] +    self._qmp.accept(self._qmp_timer)
>      >>> [ 1347s] +  File
>      >>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>      >>> line 69, in accept
>      >>> [ 1347s] +    timeout
>      >>> [ 1347s] +  File
>      >>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>      >>> line 42, in _sync
>      >>> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
>      >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/base_events.py",
>     line
>      >>> 488, in run_until_complete
>      >>> [ 1347s] +    return future.result()
>      >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py", line
>     362,
>      >>> in wait_for
>      >>> [ 1347s] +    raise futures.TimeoutError()
>      >>> [ 1347s] +concurrent.futures._base.TimeoutError
>      >>>
>      >>>
>      >>> I can see other errors like this, it's the problem of the socket.
>      >>>
>      >>> [ 1535s] socket_accept failed: Resource temporarily unavailable
>      >>> [ 1535s] **
>      >>> [ 1535s]
>      >>>
>     ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake:
>      >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
>      >>>
>      >>>
>      >>> The script is running the command as this:
>      >>> /usr/bin/make -O -j4 check-block V=1
>      >>>
>      >>> I can see the errors on ppc, arm or x86.
>      >>> But I couldn't reproduce it when I run it manually.
>      >>>
>      >>> It will be appreciated if any suggestions. Thanks.
>      >>>
>      >>>
>      >>> On 3/24/22 14:47, Li Zhang wrote:
>      >>>> Hi,
>      >>>>
>      >>>> When I run the testsuit on our buidling system, it reports a
>     timeout
>      >>>> sometimes not always as the following.
>      >>>> It couldn't connect qmp socket. Any ideas about this problem?
>      >>>>
>      >>>>
>      >>>> [ 1989s] ---
>      >>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>      >>>> [ 1989s] +++ 040.out.bad
>      >>>> [ 1989s] @@ -1,5 +1,55 @@
>      >>>> [ 1989s]
>      >>>> -.................................................................
>      >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to
>      >>>> establish connection: asyncio.exceptions.CancelledError
>      >>>> [ 1989s]
>      >>>>
>     +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed
> 
>      >>>> to establish connection: asyncio.exceptions.CancelledError
>      >>>> [ 1989s] +E.........................
>      >>>> [ 1989s]
>      >>>>
>     +======================================================================
>      >>>> [ 1989s] +ERROR: test_commit_node
>     (__main__.TestActiveZeroLengthImage)
>      >>>> [ 1989s]
>      >>>>
>     +----------------------------------------------------------------------
>      >>>> [ 1989s] +Traceback (most recent call last):
>      >>>> [ 1989s] +  File
>      >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040",
>      >>>> line 94, in setUp
>      >>>> [ 1989s] +    self.vm.launch()
>      >>>> [ 1989s] +  File
>      >>>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>      >>>> line 399, in launch
>      >>>> [ 1989s] +    self._launch()
>      >>>> [ 1989s] +  File
>      >>>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>      >>>> line 434, in _launch
>      >>>> [ 1989s] +    self._post_launch()
>      >>>> [ 1989s] +  File
>      >>>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
>      >>>> line 147, in _post_launch
>      >>>> [ 1989s] +    super()._post_launch()
>      >>>> [ 1989s] +  File
>      >>>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>      >>>> line 340, in _post_launch
>      >>>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
>      >>>> [ 1989s] +  File
>      >>>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>      >>>> line 67, in accept
>      >>>> [ 1989s] +    self._sync(
>      >>>> [ 1989s] +  File
>      >>>>
>     "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>      >>>> line 41, in _sync
>      >>>> [ 1989s] +    return self._aloop.run_until_complete(
>      >>>> [ 1989s] +  File
>     "/usr/lib64/python3.8/asyncio/base_events.py", line
>      >>>> 616, in run_until_complete
>      >>>> [ 1989s] +    return future.result()
>      >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py", line
>     501,
>      >>>> in wait_for
>      >>>> [ 1989s] +    raise exceptions.TimeoutError()
>      >>>> [ 1989s] +asyncio.exceptions.TimeoutError
>      >>>>
>      >>>
>      >>>
>      >>
>      >
> 
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-03-31 14:46           ` John Snow
@ 2022-03-31 15:18             ` Li Zhang
  2022-04-06 14:48             ` Li Zhang
  1 sibling, 0 replies; 11+ messages in thread
From: Li Zhang @ 2022-03-31 15:18 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers

On 3/31/22 16:46, John Snow wrote:
> 
> 
> On Thu, Mar 31, 2022, 10:37 AM John Snow <jsnow@redhat.com 
> <mailto:jsnow@redhat.com>> wrote:
> 
> 
> 
>     On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de
>     <mailto:lizhang@suse.de>> wrote:
> 
>         On 3/31/22 08:10, Hanna Reitz wrote:
>          > On 29.03.22 18:49, Li Zhang wrote:
>          >> Update what I observed.
>          >>
>          >> It seems that aqmp is not stable when running test cases.
>          >> So I revert the patches as the following, iotest40/41 test
>         cases work
>          >> well.
>          >
>          > Thanks for bisecting.  I haven’t seen this problem before, so
>         I didn’t
>          > look into it; CC-ing John, perhaps he figures something.
>          >
>         Thanks a lot.
>         It happens occassionally. I didn't find a way to reproduce it
>         manually.
> 
> 
>     Sorry for the instability.
> 
>     A few questions then:
> 
>     - what operating system and version
>     - kernel version
>     - architecture (looks like x86 and others?)
>     - what commit/version of QEMU are you testing?
>     - what python version are you using to run iotests?
> 
> 
> I'll also point out these patchsets, because we're in different 
> timezones and if I'm lucky it will be the answer:
> 
> [1] https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/ 
> <https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/>
> 
> [2] https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/ 
> <https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/>
> 
> #1 wasn't sufficient to fix the problem, but I had hoped #2 was. The 
> second patchset was merged somewhat recently, just before rc testing for 
> 7.0.
> 

That's great, thanks. I consider to backport them and do some testing.


> Thanks,
> --js
> 
> 
> 
> 
> 
>          > Hanna
>          >
>          >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
>          >>
>          >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>>
>          >>
>          >> Date:   Tue Oct 26 13:56:12 2021 -0400
>          >>
>          >>
>          >>
>          >>     python, iotests: replace qmp with aqmp
>          >>
>          >>
>          >>
>          >>     Swap out the synchronous QEMUMonitorProtocol from
>         qemu.qmp with
>          >> the sync
>          >>
>          >>     wrapper from qemu.aqmp instead.
>          >>
>          >>
>          >>
>          >>     Add an escape hatch in the form of the environment variable
>          >>
>          >>     QEMU_PYTHON_LEGACY_QMP which allows you to cajole
>         QEMUMachine into
>          >> using
>          >>
>          >>     the old implementation, proving that both
>         implementations work
>          >>
>          >>     concurrently.
>          >>
>          >>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>     Reviewed-by: Kevin Wolf <kwolf@redhat.com
>         <mailto:kwolf@redhat.com>>
>          >>
>          >>     Reviewed-by: Hanna Reitz <hreitz@redhat.com
>         <mailto:hreitz@redhat.com>>
>          >>
>          >>     Message-id: 20211026175612.4127598-9-jsnow@redhat.com
>         <mailto:20211026175612.4127598-9-jsnow@redhat.com>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>
>          >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
>          >>
>          >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>>
>          >>
>          >> Date:   Thu Nov 18 15:46:18 2021 -0500
>          >>
>          >>
>          >>
>          >>     python/machine: handle "fast" QEMU terminations
>          >>
>          >>
>          >>
>          >>     In the case that the QEMU process actually launches --
>         but then
>          >> dies so
>          >>
>          >>     quickly that we can't establish a QMP connection to it
>         -- QEMUMachine
>          >>
>          >>     currently calls _post_shutdown() assuming that it never
>         launched
>          >> the VM
>          >>
>          >>     process.
>          >>
>          >>
>          >>
>          >>     This isn't true, though: it "merely" may have failed to
>         establish
>          >> a QMP
>          >>
>          >>     connection and the process is in the middle of its own
>         exit path.
>          >>
>          >>
>          >>
>          >>     If we don't wait for the subprocess, the caller may get
>         a bogus
>          >> `None`
>          >>
>          >>     return for .exitcode(). This behavior was observed from
>          >>
>          >>     device-crash-test; after the switch to Async QMP, the
>         timings were
>          >>
>          >>     changed such that it was now seemingly possible to
>         witness the
>          >> failure
>          >>
>          >>     of "vm.launch()" *prior* to the exitcode becoming available.
>          >>
>          >>
>          >>
>          >>     The semantic of the `_launched` property is changed in this
>          >>
>          >>     patch. Instead of representing the condition "launch()
>         executed
>          >>
>          >>     successfully", it will now represent "has forked a child
>         process
>          >>
>          >>     successfully". This way, wait() when called in the exit
>         path won't
>          >>
>          >>     become a no-op.
>          >>
>          >>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>     Reviewed-by: Willian Rampazzo <willianr@redhat.com
>         <mailto:willianr@redhat.com>>
>          >>
>          >>     Message-id: 20211118204620.1897674-6-jsnow@redhat.com
>         <mailto:20211118204620.1897674-6-jsnow@redhat.com>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>
>          >>
>          >>
>          >>
>          >> On 3/25/22 11:17, Li Zhang wrote:
>          >>> Hi,
>          >>>
>          >>> I backport some iotests patches to the tree and change timeout.
>          >>> It doesn't work.
>          >>>
>          >>> Sometimes, iotest41 also reports the errors.
>          >>> [ 1347s]
>          >>>
>         +======================================================================
>          >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain
>         (__main__.TestSingleDrive)
>          >>> [ 1347s]
>          >>>
>         +----------------------------------------------------------------------
>          >>> [ 1347s] +Traceback (most recent call last):
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>> line 399, in launch
>          >>> [ 1347s] +    self._launch()
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>> line 434, in _launch
>          >>> [ 1347s] +    self._post_launch()
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
>         line
>          >>> 147, in _post_launch
>          >>> [ 1347s] +    super()._post_launch()
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>> line 340, in _post_launch
>          >>> [ 1347s] +    self._qmp.accept(self._qmp_timer)
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>> line 69, in accept
>          >>> [ 1347s] +    timeout
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>> line 42, in _sync
>          >>> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
>          >>> [ 1347s] +  File
>         "/usr/lib64/python3.6/asyncio/base_events.py", line
>          >>> 488, in run_until_complete
>          >>> [ 1347s] +    return future.result()
>          >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py",
>         line 362,
>          >>> in wait_for
>          >>> [ 1347s] +    raise futures.TimeoutError()
>          >>> [ 1347s] +concurrent.futures._base.TimeoutError
>          >>>
>          >>>
>          >>> I can see other errors like this, it's the problem of the
>         socket.
>          >>>
>          >>> [ 1535s] socket_accept failed: Resource temporarily unavailable
>          >>> [ 1535s] **
>          >>> [ 1535s]
>          >>>
>         ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake:
> 
>          >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
>          >>>
>          >>>
>          >>> The script is running the command as this:
>          >>> /usr/bin/make -O -j4 check-block V=1
>          >>>
>          >>> I can see the errors on ppc, arm or x86.
>          >>> But I couldn't reproduce it when I run it manually.
>          >>>
>          >>> It will be appreciated if any suggestions. Thanks.
>          >>>
>          >>>
>          >>> On 3/24/22 14:47, Li Zhang wrote:
>          >>>> Hi,
>          >>>>
>          >>>> When I run the testsuit on our buidling system, it reports
>         a timeout
>          >>>> sometimes not always as the following.
>          >>>> It couldn't connect qmp socket. Any ideas about this problem?
>          >>>>
>          >>>>
>          >>>> [ 1989s] ---
>          >>>>
>         /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>          >>>> [ 1989s] +++ 040.out.bad
>          >>>> [ 1989s] @@ -1,5 +1,55 @@
>          >>>> [ 1989s]
>          >>>>
>         -.................................................................
>          >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to
>          >>>> establish connection: asyncio.exceptions.CancelledError
>          >>>> [ 1989s]
>          >>>>
>         +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed
> 
>          >>>> to establish connection: asyncio.exceptions.CancelledError
>          >>>> [ 1989s] +E.........................
>          >>>> [ 1989s]
>          >>>>
>         +======================================================================
>          >>>> [ 1989s] +ERROR: test_commit_node
>         (__main__.TestActiveZeroLengthImage)
>          >>>> [ 1989s]
>          >>>>
>         +----------------------------------------------------------------------
>          >>>> [ 1989s] +Traceback (most recent call last):
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040",
>          >>>> line 94, in setUp
>          >>>> [ 1989s] +    self.vm.launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>>> line 399, in launch
>          >>>> [ 1989s] +    self._launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>>> line 434, in _launch
>          >>>> [ 1989s] +    self._post_launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
> 
>          >>>> line 147, in _post_launch
>          >>>> [ 1989s] +    super()._post_launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>>> line 340, in _post_launch
>          >>>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>>> line 67, in accept
>          >>>> [ 1989s] +    self._sync(
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>>> line 41, in _sync
>          >>>> [ 1989s] +    return self._aloop.run_until_complete(
>          >>>> [ 1989s] +  File
>         "/usr/lib64/python3.8/asyncio/base_events.py", line
>          >>>> 616, in run_until_complete
>          >>>> [ 1989s] +    return future.result()
>          >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py",
>         line 501,
>          >>>> in wait_for
>          >>>> [ 1989s] +    raise exceptions.TimeoutError()
>          >>>> [ 1989s] +asyncio.exceptions.TimeoutError
>          >>>>
>          >>>
>          >>>
>          >>
>          >
> 
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-03-31 14:46           ` John Snow
  2022-03-31 15:18             ` Li Zhang
@ 2022-04-06 14:48             ` Li Zhang
  2022-04-06 15:11               ` John Snow
  1 sibling, 1 reply; 11+ messages in thread
From: Li Zhang @ 2022-04-06 14:48 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers

On 3/31/22 16:46, John Snow wrote:
> 
> 
> On Thu, Mar 31, 2022, 10:37 AM John Snow <jsnow@redhat.com 
> <mailto:jsnow@redhat.com>> wrote:
> 
> 
> 
>     On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de
>     <mailto:lizhang@suse.de>> wrote:
> 
>         On 3/31/22 08:10, Hanna Reitz wrote:
>          > On 29.03.22 18:49, Li Zhang wrote:
>          >> Update what I observed.
>          >>
>          >> It seems that aqmp is not stable when running test cases.
>          >> So I revert the patches as the following, iotest40/41 test
>         cases work
>          >> well.
>          >
>          > Thanks for bisecting.  I haven’t seen this problem before, so
>         I didn’t
>          > look into it; CC-ing John, perhaps he figures something.
>          >
>         Thanks a lot.
>         It happens occassionally. I didn't find a way to reproduce it
>         manually.
> 
> 
>     Sorry for the instability.
> 
>     A few questions then:
> 
>     - what operating system and version
>     - kernel version
>     - architecture (looks like x86 and others?)
>     - what commit/version of QEMU are you testing?
>     - what python version are you using to run iotests?
> 
> 
> I'll also point out these patchsets, because we're in different 
> timezones and if I'm lucky it will be the answer:
> 
> [1] https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/ 
> <https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/>
> 
> [2] https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/ 
> <https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/>

Thanks a lot.

I backport aqmp related patches, it looks good to run the cases 
iotest040/041.

> 
> #1 wasn't sufficient to fix the problem, but I had hoped #2 was. The 
> second patchset was merged somewhat recently, just before rc testing for 
> 7.0.
> 
> Thanks,
> --js
> 
> 
> 
> 
> 
>          > Hanna
>          >
>          >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
>          >>
>          >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>>
>          >>
>          >> Date:   Tue Oct 26 13:56:12 2021 -0400
>          >>
>          >>
>          >>
>          >>     python, iotests: replace qmp with aqmp
>          >>
>          >>
>          >>
>          >>     Swap out the synchronous QEMUMonitorProtocol from
>         qemu.qmp with
>          >> the sync
>          >>
>          >>     wrapper from qemu.aqmp instead.
>          >>
>          >>
>          >>
>          >>     Add an escape hatch in the form of the environment variable
>          >>
>          >>     QEMU_PYTHON_LEGACY_QMP which allows you to cajole
>         QEMUMachine into
>          >> using
>          >>
>          >>     the old implementation, proving that both
>         implementations work
>          >>
>          >>     concurrently.
>          >>
>          >>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>     Reviewed-by: Kevin Wolf <kwolf@redhat.com
>         <mailto:kwolf@redhat.com>>
>          >>
>          >>     Reviewed-by: Hanna Reitz <hreitz@redhat.com
>         <mailto:hreitz@redhat.com>>
>          >>
>          >>     Message-id: 20211026175612.4127598-9-jsnow@redhat.com
>         <mailto:20211026175612.4127598-9-jsnow@redhat.com>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>
>          >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
>          >>
>          >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>>
>          >>
>          >> Date:   Thu Nov 18 15:46:18 2021 -0500
>          >>
>          >>
>          >>
>          >>     python/machine: handle "fast" QEMU terminations
>          >>
>          >>
>          >>
>          >>     In the case that the QEMU process actually launches --
>         but then
>          >> dies so
>          >>
>          >>     quickly that we can't establish a QMP connection to it
>         -- QEMUMachine
>          >>
>          >>     currently calls _post_shutdown() assuming that it never
>         launched
>          >> the VM
>          >>
>          >>     process.
>          >>
>          >>
>          >>
>          >>     This isn't true, though: it "merely" may have failed to
>         establish
>          >> a QMP
>          >>
>          >>     connection and the process is in the middle of its own
>         exit path.
>          >>
>          >>
>          >>
>          >>     If we don't wait for the subprocess, the caller may get
>         a bogus
>          >> `None`
>          >>
>          >>     return for .exitcode(). This behavior was observed from
>          >>
>          >>     device-crash-test; after the switch to Async QMP, the
>         timings were
>          >>
>          >>     changed such that it was now seemingly possible to
>         witness the
>          >> failure
>          >>
>          >>     of "vm.launch()" *prior* to the exitcode becoming available.
>          >>
>          >>
>          >>
>          >>     The semantic of the `_launched` property is changed in this
>          >>
>          >>     patch. Instead of representing the condition "launch()
>         executed
>          >>
>          >>     successfully", it will now represent "has forked a child
>         process
>          >>
>          >>     successfully". This way, wait() when called in the exit
>         path won't
>          >>
>          >>     become a no-op.
>          >>
>          >>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>     Reviewed-by: Willian Rampazzo <willianr@redhat.com
>         <mailto:willianr@redhat.com>>
>          >>
>          >>     Message-id: 20211118204620.1897674-6-jsnow@redhat.com
>         <mailto:20211118204620.1897674-6-jsnow@redhat.com>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>
>          >>
>          >>
>          >>
>          >> On 3/25/22 11:17, Li Zhang wrote:
>          >>> Hi,
>          >>>
>          >>> I backport some iotests patches to the tree and change timeout.
>          >>> It doesn't work.
>          >>>
>          >>> Sometimes, iotest41 also reports the errors.
>          >>> [ 1347s]
>          >>>
>         +======================================================================
>          >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain
>         (__main__.TestSingleDrive)
>          >>> [ 1347s]
>          >>>
>         +----------------------------------------------------------------------
>          >>> [ 1347s] +Traceback (most recent call last):
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>> line 399, in launch
>          >>> [ 1347s] +    self._launch()
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>> line 434, in _launch
>          >>> [ 1347s] +    self._post_launch()
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
>         line
>          >>> 147, in _post_launch
>          >>> [ 1347s] +    super()._post_launch()
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>> line 340, in _post_launch
>          >>> [ 1347s] +    self._qmp.accept(self._qmp_timer)
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>> line 69, in accept
>          >>> [ 1347s] +    timeout
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>> line 42, in _sync
>          >>> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
>          >>> [ 1347s] +  File
>         "/usr/lib64/python3.6/asyncio/base_events.py", line
>          >>> 488, in run_until_complete
>          >>> [ 1347s] +    return future.result()
>          >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py",
>         line 362,
>          >>> in wait_for
>          >>> [ 1347s] +    raise futures.TimeoutError()
>          >>> [ 1347s] +concurrent.futures._base.TimeoutError
>          >>>
>          >>>
>          >>> I can see other errors like this, it's the problem of the
>         socket.
>          >>>
>          >>> [ 1535s] socket_accept failed: Resource temporarily unavailable
>          >>> [ 1535s] **
>          >>> [ 1535s]
>          >>>
>         ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake:
> 
>          >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
>          >>>
>          >>>
>          >>> The script is running the command as this:
>          >>> /usr/bin/make -O -j4 check-block V=1
>          >>>
>          >>> I can see the errors on ppc, arm or x86.
>          >>> But I couldn't reproduce it when I run it manually.
>          >>>
>          >>> It will be appreciated if any suggestions. Thanks.
>          >>>
>          >>>
>          >>> On 3/24/22 14:47, Li Zhang wrote:
>          >>>> Hi,
>          >>>>
>          >>>> When I run the testsuit on our buidling system, it reports
>         a timeout
>          >>>> sometimes not always as the following.
>          >>>> It couldn't connect qmp socket. Any ideas about this problem?
>          >>>>
>          >>>>
>          >>>> [ 1989s] ---
>          >>>>
>         /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>          >>>> [ 1989s] +++ 040.out.bad
>          >>>> [ 1989s] @@ -1,5 +1,55 @@
>          >>>> [ 1989s]
>          >>>>
>         -.................................................................
>          >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to
>          >>>> establish connection: asyncio.exceptions.CancelledError
>          >>>> [ 1989s]
>          >>>>
>         +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed
> 
>          >>>> to establish connection: asyncio.exceptions.CancelledError
>          >>>> [ 1989s] +E.........................
>          >>>> [ 1989s]
>          >>>>
>         +======================================================================
>          >>>> [ 1989s] +ERROR: test_commit_node
>         (__main__.TestActiveZeroLengthImage)
>          >>>> [ 1989s]
>          >>>>
>         +----------------------------------------------------------------------
>          >>>> [ 1989s] +Traceback (most recent call last):
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040",
>          >>>> line 94, in setUp
>          >>>> [ 1989s] +    self.vm.launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>>> line 399, in launch
>          >>>> [ 1989s] +    self._launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>>> line 434, in _launch
>          >>>> [ 1989s] +    self._post_launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
> 
>          >>>> line 147, in _post_launch
>          >>>> [ 1989s] +    super()._post_launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>>> line 340, in _post_launch
>          >>>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>>> line 67, in accept
>          >>>> [ 1989s] +    self._sync(
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>>> line 41, in _sync
>          >>>> [ 1989s] +    return self._aloop.run_until_complete(
>          >>>> [ 1989s] +  File
>         "/usr/lib64/python3.8/asyncio/base_events.py", line
>          >>>> 616, in run_until_complete
>          >>>> [ 1989s] +    return future.result()
>          >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py",
>         line 501,
>          >>>> in wait_for
>          >>>> [ 1989s] +    raise exceptions.TimeoutError()
>          >>>> [ 1989s] +asyncio.exceptions.TimeoutError
>          >>>>
>          >>>
>          >>>
>          >>
>          >
> 
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iotest40 problem
  2022-04-06 14:48             ` Li Zhang
@ 2022-04-06 15:11               ` John Snow
  0 siblings, 0 replies; 11+ messages in thread
From: John Snow @ 2022-04-06 15:11 UTC (permalink / raw)
  To: Li Zhang; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 1964 bytes --]

On Wed, Apr 6, 2022, 10:53 AM Li Zhang <lizhang@suse.de> wrote:

> On 3/31/22 16:46, John Snow wrote:
> >
> >
> > On Thu, Mar 31, 2022, 10:37 AM John Snow <jsnow@redhat.com
> > <mailto:jsnow@redhat.com>> wrote:
> >
> >
> >
> >     On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de
> >     <mailto:lizhang@suse.de>> wrote:
> >
> >         On 3/31/22 08:10, Hanna Reitz wrote:
> >          > On 29.03.22 18:49, Li Zhang wrote:
> >          >> Update what I observed.
> >          >>
> >          >> It seems that aqmp is not stable when running test cases.
> >          >> So I revert the patches as the following, iotest40/41 test
> >         cases work
> >          >> well.
> >          >
> >          > Thanks for bisecting.  I haven’t seen this problem before, so
> >         I didn’t
> >          > look into it; CC-ing John, perhaps he figures something.
> >          >
> >         Thanks a lot.
> >         It happens occassionally. I didn't find a way to reproduce it
> >         manually.
> >
> >
> >     Sorry for the instability.
> >
> >     A few questions then:
> >
> >     - what operating system and version
> >     - kernel version
> >     - architecture (looks like x86 and others?)
> >     - what commit/version of QEMU are you testing?
> >     - what python version are you using to run iotests?
> >
> >
> > I'll also point out these patchsets, because we're in different
> > timezones and if I'm lucky it will be the answer:
> >
> > [1] https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/
> > <https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/>
> >
> > [2] https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/
> > <https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/>
>
> Thanks a lot.
>
> I backport aqmp related patches, it looks good to run the cases
> iotest040/041.
>

Excellent, thanks for reporting back!

[-- Attachment #2: Type: text/html, Size: 3534 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-04-06 15:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-24 13:47 iotest40 problem Li Zhang
2022-03-25 10:17 ` Li Zhang
2022-03-29 16:49   ` Li Zhang
2022-03-31  6:10     ` Hanna Reitz
2022-03-31 10:44       ` Li Zhang
2022-03-31 14:37         ` John Snow
2022-03-31 14:46           ` John Snow
2022-03-31 15:18             ` Li Zhang
2022-04-06 14:48             ` Li Zhang
2022-04-06 15:11               ` John Snow
2022-03-31 15:16           ` Li Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).