* iotest40 problem @ 2022-03-24 13:47 Li Zhang 2022-03-25 10:17 ` Li Zhang 0 siblings, 1 reply; 11+ messages in thread From: Li Zhang @ 2022-03-24 13:47 UTC (permalink / raw) To: QEMU Developers Hi, When I run the testsuit on our buidling system, it reports a timeout sometimes not always as the following. It couldn't connect qmp socket. Any ideas about this problem? [ 1989s] --- /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out [ 1989s] +++ 040.out.bad [ 1989s] @@ -1,5 +1,55 @@ [ 1989s] -................................................................. [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to establish connection: asyncio.exceptions.CancelledError [ 1989s] +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to establish connection: asyncio.exceptions.CancelledError [ 1989s] +E......................... [ 1989s] +====================================================================== [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage) [ 1989s] +---------------------------------------------------------------------- [ 1989s] +Traceback (most recent call last): [ 1989s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", line 94, in setUp [ 1989s] + self.vm.launch() [ 1989s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", line 399, in launch [ 1989s] + self._launch() [ 1989s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", line 434, in _launch [ 1989s] + self._post_launch() [ 1989s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", line 147, in _post_launch [ 1989s] + super()._post_launch() [ 1989s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", line 340, in _post_launch [ 1989s] + self._qmp.accept(self._qmp_timer) [ 1989s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", line 67, in accept [ 1989s] + self._sync( [ 1989s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", line 41, in _sync [ 1989s] + return self._aloop.run_until_complete( [ 1989s] + File "/usr/lib64/python3.8/asyncio/base_events.py", line 616, in run_until_complete [ 1989s] + return future.result() [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, in wait_for [ 1989s] + raise exceptions.TimeoutError() [ 1989s] +asyncio.exceptions.TimeoutError ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-03-24 13:47 iotest40 problem Li Zhang @ 2022-03-25 10:17 ` Li Zhang 2022-03-29 16:49 ` Li Zhang 0 siblings, 1 reply; 11+ messages in thread From: Li Zhang @ 2022-03-25 10:17 UTC (permalink / raw) To: QEMU Developers, kwolf, Hanna Reitz Hi, I backport some iotests patches to the tree and change timeout. It doesn't work. Sometimes, iotest41 also reports the errors. [ 1347s] +====================================================================== [ 1347s] +ERROR: test_top_node_in_wrong_chain (__main__.TestSingleDrive) [ 1347s] +---------------------------------------------------------------------- [ 1347s] +Traceback (most recent call last): [ 1347s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", line 399, in launch [ 1347s] + self._launch() [ 1347s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", line 434, in _launch [ 1347s] + self._post_launch() [ 1347s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", line 147, in _post_launch [ 1347s] + super()._post_launch() [ 1347s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", line 340, in _post_launch [ 1347s] + self._qmp.accept(self._qmp_timer) [ 1347s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", line 69, in accept [ 1347s] + timeout [ 1347s] + File "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", line 42, in _sync [ 1347s] + asyncio.wait_for(future, timeout=timeout) [ 1347s] + File "/usr/lib64/python3.6/asyncio/base_events.py", line 488, in run_until_complete [ 1347s] + return future.result() [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, in wait_for [ 1347s] + raise futures.TimeoutError() [ 1347s] +concurrent.futures._base.TimeoutError I can see other errors like this, it's the problem of the socket. [ 1535s] socket_accept failed: Resource temporarily unavailable [ 1535s] ** [ 1535s] ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) The script is running the command as this: /usr/bin/make -O -j4 check-block V=1 I can see the errors on ppc, arm or x86. But I couldn't reproduce it when I run it manually. It will be appreciated if any suggestions. Thanks. On 3/24/22 14:47, Li Zhang wrote: > Hi, > > When I run the testsuit on our buidling system, it reports a timeout > sometimes not always as the following. > It couldn't connect qmp socket. Any ideas about this problem? > > > [ 1989s] --- > /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out > [ 1989s] +++ 040.out.bad > [ 1989s] @@ -1,5 +1,55 @@ > [ 1989s] -................................................................. > [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to establish > connection: asyncio.exceptions.CancelledError > [ 1989s] > +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed > to establish connection: asyncio.exceptions.CancelledError > [ 1989s] +E......................... > [ 1989s] > +====================================================================== > [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage) > [ 1989s] > +---------------------------------------------------------------------- > [ 1989s] +Traceback (most recent call last): > [ 1989s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", line > 94, in setUp > [ 1989s] + self.vm.launch() > [ 1989s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > line 399, in launch > [ 1989s] + self._launch() > [ 1989s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > line 434, in _launch > [ 1989s] + self._post_launch() > [ 1989s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > line 147, in _post_launch > [ 1989s] + super()._post_launch() > [ 1989s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > line 340, in _post_launch > [ 1989s] + self._qmp.accept(self._qmp_timer) > [ 1989s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > line 67, in accept > [ 1989s] + self._sync( > [ 1989s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > line 41, in _sync > [ 1989s] + return self._aloop.run_until_complete( > [ 1989s] + File "/usr/lib64/python3.8/asyncio/base_events.py", line > 616, in run_until_complete > [ 1989s] + return future.result() > [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, in > wait_for > [ 1989s] + raise exceptions.TimeoutError() > [ 1989s] +asyncio.exceptions.TimeoutError > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-03-25 10:17 ` Li Zhang @ 2022-03-29 16:49 ` Li Zhang 2022-03-31 6:10 ` Hanna Reitz 0 siblings, 1 reply; 11+ messages in thread From: Li Zhang @ 2022-03-29 16:49 UTC (permalink / raw) To: QEMU Developers, kwolf, Hanna Reitz Update what I observed. It seems that aqmp is not stable when running test cases. So I revert the patches as the following, iotest40/41 test cases work well. commit 76cd358671e6b8e7c435ec65b1c44200254514a9 Author: John Snow <jsnow@redhat.com> Date: Tue Oct 26 13:56:12 2021 -0400 python, iotests: replace qmp with aqmp Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with the sync wrapper from qemu.aqmp instead. Add an escape hatch in the form of the environment variable QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into using the old implementation, proving that both implementations work concurrently. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-id: 20211026175612.4127598-9-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5 Author: John Snow <jsnow@redhat.com> Date: Thu Nov 18 15:46:18 2021 -0500 python/machine: handle "fast" QEMU terminations In the case that the QEMU process actually launches -- but then dies so quickly that we can't establish a QMP connection to it -- QEMUMachine currently calls _post_shutdown() assuming that it never launched the VM process. This isn't true, though: it "merely" may have failed to establish a QMP connection and the process is in the middle of its own exit path. If we don't wait for the subprocess, the caller may get a bogus `None` return for .exitcode(). This behavior was observed from device-crash-test; after the switch to Async QMP, the timings were changed such that it was now seemingly possible to witness the failure of "vm.launch()" *prior* to the exitcode becoming available. The semantic of the `_launched` property is changed in this patch. Instead of representing the condition "launch() executed successfully", it will now represent "has forked a child process successfully". This way, wait() when called in the exit path won't become a no-op. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Willian Rampazzo <willianr@redhat.com> Message-id: 20211118204620.1897674-6-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com> On 3/25/22 11:17, Li Zhang wrote: > Hi, > > I backport some iotests patches to the tree and change timeout. > It doesn't work. > > Sometimes, iotest41 also reports the errors. > [ 1347s] > +====================================================================== > [ 1347s] +ERROR: test_top_node_in_wrong_chain (__main__.TestSingleDrive) > [ 1347s] > +---------------------------------------------------------------------- > [ 1347s] +Traceback (most recent call last): > [ 1347s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > line 399, in launch > [ 1347s] + self._launch() > [ 1347s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > line 434, in _launch > [ 1347s] + self._post_launch() > [ 1347s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > line 147, in _post_launch > [ 1347s] + super()._post_launch() > [ 1347s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > line 340, in _post_launch > [ 1347s] + self._qmp.accept(self._qmp_timer) > [ 1347s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > line 69, in accept > [ 1347s] + timeout > [ 1347s] + File > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > line 42, in _sync > [ 1347s] + asyncio.wait_for(future, timeout=timeout) > [ 1347s] + File "/usr/lib64/python3.6/asyncio/base_events.py", line > 488, in run_until_complete > [ 1347s] + return future.result() > [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, in > wait_for > [ 1347s] + raise futures.TimeoutError() > [ 1347s] +concurrent.futures._base.TimeoutError > > > I can see other errors like this, it's the problem of the socket. > > [ 1535s] socket_accept failed: Resource temporarily unavailable > [ 1535s] ** > [ 1535s] > ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: > assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) > > > The script is running the command as this: > /usr/bin/make -O -j4 check-block V=1 > > I can see the errors on ppc, arm or x86. > But I couldn't reproduce it when I run it manually. > > It will be appreciated if any suggestions. Thanks. > > > On 3/24/22 14:47, Li Zhang wrote: >> Hi, >> >> When I run the testsuit on our buidling system, it reports a timeout >> sometimes not always as the following. >> It couldn't connect qmp socket. Any ideas about this problem? >> >> >> [ 1989s] --- >> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out >> [ 1989s] +++ 040.out.bad >> [ 1989s] @@ -1,5 +1,55 @@ >> [ 1989s] >> -................................................................. >> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to establish >> connection: asyncio.exceptions.CancelledError >> [ 1989s] >> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed >> to establish connection: asyncio.exceptions.CancelledError >> [ 1989s] +E......................... >> [ 1989s] >> +====================================================================== >> [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage) >> [ 1989s] >> +---------------------------------------------------------------------- >> [ 1989s] +Traceback (most recent call last): >> [ 1989s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", line >> 94, in setUp >> [ 1989s] + self.vm.launch() >> [ 1989s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> line 399, in launch >> [ 1989s] + self._launch() >> [ 1989s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> line 434, in _launch >> [ 1989s] + self._post_launch() >> [ 1989s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", >> line 147, in _post_launch >> [ 1989s] + super()._post_launch() >> [ 1989s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> line 340, in _post_launch >> [ 1989s] + self._qmp.accept(self._qmp_timer) >> [ 1989s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> line 67, in accept >> [ 1989s] + self._sync( >> [ 1989s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> line 41, in _sync >> [ 1989s] + return self._aloop.run_until_complete( >> [ 1989s] + File "/usr/lib64/python3.8/asyncio/base_events.py", line >> 616, in run_until_complete >> [ 1989s] + return future.result() >> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, in >> wait_for >> [ 1989s] + raise exceptions.TimeoutError() >> [ 1989s] +asyncio.exceptions.TimeoutError >> > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-03-29 16:49 ` Li Zhang @ 2022-03-31 6:10 ` Hanna Reitz 2022-03-31 10:44 ` Li Zhang 0 siblings, 1 reply; 11+ messages in thread From: Hanna Reitz @ 2022-03-31 6:10 UTC (permalink / raw) To: Li Zhang, QEMU Developers, kwolf, John Snow On 29.03.22 18:49, Li Zhang wrote: > Update what I observed. > > It seems that aqmp is not stable when running test cases. > So I revert the patches as the following, iotest40/41 test cases work > well. Thanks for bisecting. I haven’t seen this problem before, so I didn’t look into it; CC-ing John, perhaps he figures something. Hanna > commit 76cd358671e6b8e7c435ec65b1c44200254514a9 > > Author: John Snow <jsnow@redhat.com> > > Date: Tue Oct 26 13:56:12 2021 -0400 > > > > python, iotests: replace qmp with aqmp > > > > Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with > the sync > > wrapper from qemu.aqmp instead. > > > > Add an escape hatch in the form of the environment variable > > QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into > using > > the old implementation, proving that both implementations work > > concurrently. > > > > Signed-off-by: John Snow <jsnow@redhat.com> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > > Reviewed-by: Hanna Reitz <hreitz@redhat.com> > > Message-id: 20211026175612.4127598-9-jsnow@redhat.com > > Signed-off-by: John Snow <jsnow@redhat.com> > > > commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5 > > Author: John Snow <jsnow@redhat.com> > > Date: Thu Nov 18 15:46:18 2021 -0500 > > > > python/machine: handle "fast" QEMU terminations > > > > In the case that the QEMU process actually launches -- but then > dies so > > quickly that we can't establish a QMP connection to it -- QEMUMachine > > currently calls _post_shutdown() assuming that it never launched > the VM > > process. > > > > This isn't true, though: it "merely" may have failed to establish > a QMP > > connection and the process is in the middle of its own exit path. > > > > If we don't wait for the subprocess, the caller may get a bogus > `None` > > return for .exitcode(). This behavior was observed from > > device-crash-test; after the switch to Async QMP, the timings were > > changed such that it was now seemingly possible to witness the > failure > > of "vm.launch()" *prior* to the exitcode becoming available. > > > > The semantic of the `_launched` property is changed in this > > patch. Instead of representing the condition "launch() executed > > successfully", it will now represent "has forked a child process > > successfully". This way, wait() when called in the exit path won't > > become a no-op. > > > > Signed-off-by: John Snow <jsnow@redhat.com> > > Reviewed-by: Willian Rampazzo <willianr@redhat.com> > > Message-id: 20211118204620.1897674-6-jsnow@redhat.com > > Signed-off-by: John Snow <jsnow@redhat.com> > > > > > > On 3/25/22 11:17, Li Zhang wrote: >> Hi, >> >> I backport some iotests patches to the tree and change timeout. >> It doesn't work. >> >> Sometimes, iotest41 also reports the errors. >> [ 1347s] >> +====================================================================== >> [ 1347s] +ERROR: test_top_node_in_wrong_chain (__main__.TestSingleDrive) >> [ 1347s] >> +---------------------------------------------------------------------- >> [ 1347s] +Traceback (most recent call last): >> [ 1347s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> line 399, in launch >> [ 1347s] + self._launch() >> [ 1347s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> line 434, in _launch >> [ 1347s] + self._post_launch() >> [ 1347s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", >> line 147, in _post_launch >> [ 1347s] + super()._post_launch() >> [ 1347s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> line 340, in _post_launch >> [ 1347s] + self._qmp.accept(self._qmp_timer) >> [ 1347s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> line 69, in accept >> [ 1347s] + timeout >> [ 1347s] + File >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> line 42, in _sync >> [ 1347s] + asyncio.wait_for(future, timeout=timeout) >> [ 1347s] + File "/usr/lib64/python3.6/asyncio/base_events.py", line >> 488, in run_until_complete >> [ 1347s] + return future.result() >> [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, >> in wait_for >> [ 1347s] + raise futures.TimeoutError() >> [ 1347s] +concurrent.futures._base.TimeoutError >> >> >> I can see other errors like this, it's the problem of the socket. >> >> [ 1535s] socket_accept failed: Resource temporarily unavailable >> [ 1535s] ** >> [ 1535s] >> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: >> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) >> >> >> The script is running the command as this: >> /usr/bin/make -O -j4 check-block V=1 >> >> I can see the errors on ppc, arm or x86. >> But I couldn't reproduce it when I run it manually. >> >> It will be appreciated if any suggestions. Thanks. >> >> >> On 3/24/22 14:47, Li Zhang wrote: >>> Hi, >>> >>> When I run the testsuit on our buidling system, it reports a timeout >>> sometimes not always as the following. >>> It couldn't connect qmp socket. Any ideas about this problem? >>> >>> >>> [ 1989s] --- >>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out >>> [ 1989s] +++ 040.out.bad >>> [ 1989s] @@ -1,5 +1,55 @@ >>> [ 1989s] >>> -................................................................. >>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to >>> establish connection: asyncio.exceptions.CancelledError >>> [ 1989s] >>> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed >>> to establish connection: asyncio.exceptions.CancelledError >>> [ 1989s] +E......................... >>> [ 1989s] >>> +====================================================================== >>> [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage) >>> [ 1989s] >>> +---------------------------------------------------------------------- >>> [ 1989s] +Traceback (most recent call last): >>> [ 1989s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", >>> line 94, in setUp >>> [ 1989s] + self.vm.launch() >>> [ 1989s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >>> line 399, in launch >>> [ 1989s] + self._launch() >>> [ 1989s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >>> line 434, in _launch >>> [ 1989s] + self._post_launch() >>> [ 1989s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", >>> line 147, in _post_launch >>> [ 1989s] + super()._post_launch() >>> [ 1989s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >>> line 340, in _post_launch >>> [ 1989s] + self._qmp.accept(self._qmp_timer) >>> [ 1989s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >>> line 67, in accept >>> [ 1989s] + self._sync( >>> [ 1989s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >>> line 41, in _sync >>> [ 1989s] + return self._aloop.run_until_complete( >>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/base_events.py", line >>> 616, in run_until_complete >>> [ 1989s] + return future.result() >>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, >>> in wait_for >>> [ 1989s] + raise exceptions.TimeoutError() >>> [ 1989s] +asyncio.exceptions.TimeoutError >>> >> >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-03-31 6:10 ` Hanna Reitz @ 2022-03-31 10:44 ` Li Zhang 2022-03-31 14:37 ` John Snow 0 siblings, 1 reply; 11+ messages in thread From: Li Zhang @ 2022-03-31 10:44 UTC (permalink / raw) To: Hanna Reitz, QEMU Developers, kwolf, John Snow On 3/31/22 08:10, Hanna Reitz wrote: > On 29.03.22 18:49, Li Zhang wrote: >> Update what I observed. >> >> It seems that aqmp is not stable when running test cases. >> So I revert the patches as the following, iotest40/41 test cases work >> well. > > Thanks for bisecting. I haven’t seen this problem before, so I didn’t > look into it; CC-ing John, perhaps he figures something. > Thanks a lot. It happens occassionally. I didn't find a way to reproduce it manually. > Hanna > >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9 >> >> Author: John Snow <jsnow@redhat.com> >> >> Date: Tue Oct 26 13:56:12 2021 -0400 >> >> >> >> python, iotests: replace qmp with aqmp >> >> >> >> Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with >> the sync >> >> wrapper from qemu.aqmp instead. >> >> >> >> Add an escape hatch in the form of the environment variable >> >> QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into >> using >> >> the old implementation, proving that both implementations work >> >> concurrently. >> >> >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> >> Reviewed-by: Kevin Wolf <kwolf@redhat.com> >> >> Reviewed-by: Hanna Reitz <hreitz@redhat.com> >> >> Message-id: 20211026175612.4127598-9-jsnow@redhat.com >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> >> >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5 >> >> Author: John Snow <jsnow@redhat.com> >> >> Date: Thu Nov 18 15:46:18 2021 -0500 >> >> >> >> python/machine: handle "fast" QEMU terminations >> >> >> >> In the case that the QEMU process actually launches -- but then >> dies so >> >> quickly that we can't establish a QMP connection to it -- QEMUMachine >> >> currently calls _post_shutdown() assuming that it never launched >> the VM >> >> process. >> >> >> >> This isn't true, though: it "merely" may have failed to establish >> a QMP >> >> connection and the process is in the middle of its own exit path. >> >> >> >> If we don't wait for the subprocess, the caller may get a bogus >> `None` >> >> return for .exitcode(). This behavior was observed from >> >> device-crash-test; after the switch to Async QMP, the timings were >> >> changed such that it was now seemingly possible to witness the >> failure >> >> of "vm.launch()" *prior* to the exitcode becoming available. >> >> >> >> The semantic of the `_launched` property is changed in this >> >> patch. Instead of representing the condition "launch() executed >> >> successfully", it will now represent "has forked a child process >> >> successfully". This way, wait() when called in the exit path won't >> >> become a no-op. >> >> >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> >> Reviewed-by: Willian Rampazzo <willianr@redhat.com> >> >> Message-id: 20211118204620.1897674-6-jsnow@redhat.com >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> >> >> >> >> >> On 3/25/22 11:17, Li Zhang wrote: >>> Hi, >>> >>> I backport some iotests patches to the tree and change timeout. >>> It doesn't work. >>> >>> Sometimes, iotest41 also reports the errors. >>> [ 1347s] >>> +====================================================================== >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain (__main__.TestSingleDrive) >>> [ 1347s] >>> +---------------------------------------------------------------------- >>> [ 1347s] +Traceback (most recent call last): >>> [ 1347s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >>> line 399, in launch >>> [ 1347s] + self._launch() >>> [ 1347s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >>> line 434, in _launch >>> [ 1347s] + self._post_launch() >>> [ 1347s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", line >>> 147, in _post_launch >>> [ 1347s] + super()._post_launch() >>> [ 1347s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >>> line 340, in _post_launch >>> [ 1347s] + self._qmp.accept(self._qmp_timer) >>> [ 1347s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >>> line 69, in accept >>> [ 1347s] + timeout >>> [ 1347s] + File >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >>> line 42, in _sync >>> [ 1347s] + asyncio.wait_for(future, timeout=timeout) >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/base_events.py", line >>> 488, in run_until_complete >>> [ 1347s] + return future.result() >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, >>> in wait_for >>> [ 1347s] + raise futures.TimeoutError() >>> [ 1347s] +concurrent.futures._base.TimeoutError >>> >>> >>> I can see other errors like this, it's the problem of the socket. >>> >>> [ 1535s] socket_accept failed: Resource temporarily unavailable >>> [ 1535s] ** >>> [ 1535s] >>> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) >>> >>> >>> The script is running the command as this: >>> /usr/bin/make -O -j4 check-block V=1 >>> >>> I can see the errors on ppc, arm or x86. >>> But I couldn't reproduce it when I run it manually. >>> >>> It will be appreciated if any suggestions. Thanks. >>> >>> >>> On 3/24/22 14:47, Li Zhang wrote: >>>> Hi, >>>> >>>> When I run the testsuit on our buidling system, it reports a timeout >>>> sometimes not always as the following. >>>> It couldn't connect qmp socket. Any ideas about this problem? >>>> >>>> >>>> [ 1989s] --- >>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out >>>> [ 1989s] +++ 040.out.bad >>>> [ 1989s] @@ -1,5 +1,55 @@ >>>> [ 1989s] >>>> -................................................................. >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to >>>> establish connection: asyncio.exceptions.CancelledError >>>> [ 1989s] >>>> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed >>>> to establish connection: asyncio.exceptions.CancelledError >>>> [ 1989s] +E......................... >>>> [ 1989s] >>>> +====================================================================== >>>> [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage) >>>> [ 1989s] >>>> +---------------------------------------------------------------------- >>>> [ 1989s] +Traceback (most recent call last): >>>> [ 1989s] + File >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", >>>> line 94, in setUp >>>> [ 1989s] + self.vm.launch() >>>> [ 1989s] + File >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >>>> line 399, in launch >>>> [ 1989s] + self._launch() >>>> [ 1989s] + File >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >>>> line 434, in _launch >>>> [ 1989s] + self._post_launch() >>>> [ 1989s] + File >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", >>>> line 147, in _post_launch >>>> [ 1989s] + super()._post_launch() >>>> [ 1989s] + File >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >>>> line 340, in _post_launch >>>> [ 1989s] + self._qmp.accept(self._qmp_timer) >>>> [ 1989s] + File >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >>>> line 67, in accept >>>> [ 1989s] + self._sync( >>>> [ 1989s] + File >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >>>> line 41, in _sync >>>> [ 1989s] + return self._aloop.run_until_complete( >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/base_events.py", line >>>> 616, in run_until_complete >>>> [ 1989s] + return future.result() >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, >>>> in wait_for >>>> [ 1989s] + raise exceptions.TimeoutError() >>>> [ 1989s] +asyncio.exceptions.TimeoutError >>>> >>> >>> >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-03-31 10:44 ` Li Zhang @ 2022-03-31 14:37 ` John Snow 2022-03-31 14:46 ` John Snow 2022-03-31 15:16 ` Li Zhang 0 siblings, 2 replies; 11+ messages in thread From: John Snow @ 2022-03-31 14:37 UTC (permalink / raw) To: Li Zhang; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers [-- Attachment #1: Type: text/plain, Size: 9146 bytes --] On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de> wrote: > On 3/31/22 08:10, Hanna Reitz wrote: > > On 29.03.22 18:49, Li Zhang wrote: > >> Update what I observed. > >> > >> It seems that aqmp is not stable when running test cases. > >> So I revert the patches as the following, iotest40/41 test cases work > >> well. > > > > Thanks for bisecting. I haven’t seen this problem before, so I didn’t > > look into it; CC-ing John, perhaps he figures something. > > > Thanks a lot. > It happens occassionally. I didn't find a way to reproduce it manually. > Sorry for the instability. A few questions then: - what operating system and version - kernel version - architecture (looks like x86 and others?) - what commit/version of QEMU are you testing? - what python version are you using to run iotests? > > Hanna > > > >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9 > >> > >> Author: John Snow <jsnow@redhat.com> > >> > >> Date: Tue Oct 26 13:56:12 2021 -0400 > >> > >> > >> > >> python, iotests: replace qmp with aqmp > >> > >> > >> > >> Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with > >> the sync > >> > >> wrapper from qemu.aqmp instead. > >> > >> > >> > >> Add an escape hatch in the form of the environment variable > >> > >> QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into > >> using > >> > >> the old implementation, proving that both implementations work > >> > >> concurrently. > >> > >> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com> > >> > >> Reviewed-by: Kevin Wolf <kwolf@redhat.com> > >> > >> Reviewed-by: Hanna Reitz <hreitz@redhat.com> > >> > >> Message-id: 20211026175612.4127598-9-jsnow@redhat.com > >> > >> Signed-off-by: John Snow <jsnow@redhat.com> > >> > >> > >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5 > >> > >> Author: John Snow <jsnow@redhat.com> > >> > >> Date: Thu Nov 18 15:46:18 2021 -0500 > >> > >> > >> > >> python/machine: handle "fast" QEMU terminations > >> > >> > >> > >> In the case that the QEMU process actually launches -- but then > >> dies so > >> > >> quickly that we can't establish a QMP connection to it -- > QEMUMachine > >> > >> currently calls _post_shutdown() assuming that it never launched > >> the VM > >> > >> process. > >> > >> > >> > >> This isn't true, though: it "merely" may have failed to establish > >> a QMP > >> > >> connection and the process is in the middle of its own exit path. > >> > >> > >> > >> If we don't wait for the subprocess, the caller may get a bogus > >> `None` > >> > >> return for .exitcode(). This behavior was observed from > >> > >> device-crash-test; after the switch to Async QMP, the timings were > >> > >> changed such that it was now seemingly possible to witness the > >> failure > >> > >> of "vm.launch()" *prior* to the exitcode becoming available. > >> > >> > >> > >> The semantic of the `_launched` property is changed in this > >> > >> patch. Instead of representing the condition "launch() executed > >> > >> successfully", it will now represent "has forked a child process > >> > >> successfully". This way, wait() when called in the exit path won't > >> > >> become a no-op. > >> > >> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com> > >> > >> Reviewed-by: Willian Rampazzo <willianr@redhat.com> > >> > >> Message-id: 20211118204620.1897674-6-jsnow@redhat.com > >> > >> Signed-off-by: John Snow <jsnow@redhat.com> > >> > >> > >> > >> > >> > >> On 3/25/22 11:17, Li Zhang wrote: > >>> Hi, > >>> > >>> I backport some iotests patches to the tree and change timeout. > >>> It doesn't work. > >>> > >>> Sometimes, iotest41 also reports the errors. > >>> [ 1347s] > >>> +====================================================================== > >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain > (__main__.TestSingleDrive) > >>> [ 1347s] > >>> +---------------------------------------------------------------------- > >>> [ 1347s] +Traceback (most recent call last): > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > >>> line 399, in launch > >>> [ 1347s] + self._launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > >>> line 434, in _launch > >>> [ 1347s] + self._post_launch() > >>> [ 1347s] + File > >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > line > >>> 147, in _post_launch > >>> [ 1347s] + super()._post_launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > >>> line 340, in _post_launch > >>> [ 1347s] + self._qmp.accept(self._qmp_timer) > >>> [ 1347s] + File > >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > >>> line 69, in accept > >>> [ 1347s] + timeout > >>> [ 1347s] + File > >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > >>> line 42, in _sync > >>> [ 1347s] + asyncio.wait_for(future, timeout=timeout) > >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/base_events.py", line > >>> 488, in run_until_complete > >>> [ 1347s] + return future.result() > >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, > >>> in wait_for > >>> [ 1347s] + raise futures.TimeoutError() > >>> [ 1347s] +concurrent.futures._base.TimeoutError > >>> > >>> > >>> I can see other errors like this, it's the problem of the socket. > >>> > >>> [ 1535s] socket_accept failed: Resource temporarily unavailable > >>> [ 1535s] ** > >>> [ 1535s] > >>> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: > >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) > >>> > >>> > >>> The script is running the command as this: > >>> /usr/bin/make -O -j4 check-block V=1 > >>> > >>> I can see the errors on ppc, arm or x86. > >>> But I couldn't reproduce it when I run it manually. > >>> > >>> It will be appreciated if any suggestions. Thanks. > >>> > >>> > >>> On 3/24/22 14:47, Li Zhang wrote: > >>>> Hi, > >>>> > >>>> When I run the testsuit on our buidling system, it reports a timeout > >>>> sometimes not always as the following. > >>>> It couldn't connect qmp socket. Any ideas about this problem? > >>>> > >>>> > >>>> [ 1989s] --- > >>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out > >>>> [ 1989s] +++ 040.out.bad > >>>> [ 1989s] @@ -1,5 +1,55 @@ > >>>> [ 1989s] > >>>> -................................................................. > >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to > >>>> establish connection: asyncio.exceptions.CancelledError > >>>> [ 1989s] > >>>> > +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed > > >>>> to establish connection: asyncio.exceptions.CancelledError > >>>> [ 1989s] +E......................... > >>>> [ 1989s] > >>>> > +====================================================================== > >>>> [ 1989s] +ERROR: test_commit_node (__main__.TestActiveZeroLengthImage) > >>>> [ 1989s] > >>>> > +---------------------------------------------------------------------- > >>>> [ 1989s] +Traceback (most recent call last): > >>>> [ 1989s] + File > >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", > >>>> line 94, in setUp > >>>> [ 1989s] + self.vm.launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > >>>> line 399, in launch > >>>> [ 1989s] + self._launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > >>>> line 434, in _launch > >>>> [ 1989s] + self._post_launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > >>>> line 147, in _post_launch > >>>> [ 1989s] + super()._post_launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > >>>> line 340, in _post_launch > >>>> [ 1989s] + self._qmp.accept(self._qmp_timer) > >>>> [ 1989s] + File > >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > >>>> line 67, in accept > >>>> [ 1989s] + self._sync( > >>>> [ 1989s] + File > >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > >>>> line 41, in _sync > >>>> [ 1989s] + return self._aloop.run_until_complete( > >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/base_events.py", line > >>>> 616, in run_until_complete > >>>> [ 1989s] + return future.result() > >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, > >>>> in wait_for > >>>> [ 1989s] + raise exceptions.TimeoutError() > >>>> [ 1989s] +asyncio.exceptions.TimeoutError > >>>> > >>> > >>> > >> > > > > > [-- Attachment #2: Type: text/html, Size: 13918 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-03-31 14:37 ` John Snow @ 2022-03-31 14:46 ` John Snow 2022-03-31 15:18 ` Li Zhang 2022-04-06 14:48 ` Li Zhang 2022-03-31 15:16 ` Li Zhang 1 sibling, 2 replies; 11+ messages in thread From: John Snow @ 2022-03-31 14:46 UTC (permalink / raw) To: Li Zhang; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers [-- Attachment #1: Type: text/plain, Size: 9972 bytes --] On Thu, Mar 31, 2022, 10:37 AM John Snow <jsnow@redhat.com> wrote: > > > On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de> wrote: > >> On 3/31/22 08:10, Hanna Reitz wrote: >> > On 29.03.22 18:49, Li Zhang wrote: >> >> Update what I observed. >> >> >> >> It seems that aqmp is not stable when running test cases. >> >> So I revert the patches as the following, iotest40/41 test cases work >> >> well. >> > >> > Thanks for bisecting. I haven’t seen this problem before, so I didn’t >> > look into it; CC-ing John, perhaps he figures something. >> > >> Thanks a lot. >> It happens occassionally. I didn't find a way to reproduce it manually. >> > > Sorry for the instability. > > A few questions then: > > - what operating system and version > - kernel version > - architecture (looks like x86 and others?) > - what commit/version of QEMU are you testing? > - what python version are you using to run iotests? > I'll also point out these patchsets, because we're in different timezones and if I'm lucky it will be the answer: [1] https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/ [2] https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/ #1 wasn't sufficient to fix the problem, but I had hoped #2 was. The second patchset was merged somewhat recently, just before rc testing for 7.0. Thanks, --js > > >> > Hanna >> > >> >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9 >> >> >> >> Author: John Snow <jsnow@redhat.com> >> >> >> >> Date: Tue Oct 26 13:56:12 2021 -0400 >> >> >> >> >> >> >> >> python, iotests: replace qmp with aqmp >> >> >> >> >> >> >> >> Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with >> >> the sync >> >> >> >> wrapper from qemu.aqmp instead. >> >> >> >> >> >> >> >> Add an escape hatch in the form of the environment variable >> >> >> >> QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into >> >> using >> >> >> >> the old implementation, proving that both implementations work >> >> >> >> concurrently. >> >> >> >> >> >> >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> >> >> >> Reviewed-by: Kevin Wolf <kwolf@redhat.com> >> >> >> >> Reviewed-by: Hanna Reitz <hreitz@redhat.com> >> >> >> >> Message-id: 20211026175612.4127598-9-jsnow@redhat.com >> >> >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> >> >> >> >> >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5 >> >> >> >> Author: John Snow <jsnow@redhat.com> >> >> >> >> Date: Thu Nov 18 15:46:18 2021 -0500 >> >> >> >> >> >> >> >> python/machine: handle "fast" QEMU terminations >> >> >> >> >> >> >> >> In the case that the QEMU process actually launches -- but then >> >> dies so >> >> >> >> quickly that we can't establish a QMP connection to it -- >> QEMUMachine >> >> >> >> currently calls _post_shutdown() assuming that it never launched >> >> the VM >> >> >> >> process. >> >> >> >> >> >> >> >> This isn't true, though: it "merely" may have failed to establish >> >> a QMP >> >> >> >> connection and the process is in the middle of its own exit path. >> >> >> >> >> >> >> >> If we don't wait for the subprocess, the caller may get a bogus >> >> `None` >> >> >> >> return for .exitcode(). This behavior was observed from >> >> >> >> device-crash-test; after the switch to Async QMP, the timings were >> >> >> >> changed such that it was now seemingly possible to witness the >> >> failure >> >> >> >> of "vm.launch()" *prior* to the exitcode becoming available. >> >> >> >> >> >> >> >> The semantic of the `_launched` property is changed in this >> >> >> >> patch. Instead of representing the condition "launch() executed >> >> >> >> successfully", it will now represent "has forked a child process >> >> >> >> successfully". This way, wait() when called in the exit path won't >> >> >> >> become a no-op. >> >> >> >> >> >> >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> >> >> >> Reviewed-by: Willian Rampazzo <willianr@redhat.com> >> >> >> >> Message-id: 20211118204620.1897674-6-jsnow@redhat.com >> >> >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> >> >> >> >> >> >> >> >> >> >> >> On 3/25/22 11:17, Li Zhang wrote: >> >>> Hi, >> >>> >> >>> I backport some iotests patches to the tree and change timeout. >> >>> It doesn't work. >> >>> >> >>> Sometimes, iotest41 also reports the errors. >> >>> [ 1347s] >> >>> >> +====================================================================== >> >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain >> (__main__.TestSingleDrive) >> >>> [ 1347s] >> >>> >> +---------------------------------------------------------------------- >> >>> [ 1347s] +Traceback (most recent call last): >> >>> [ 1347s] + File >> >>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>> line 399, in launch >> >>> [ 1347s] + self._launch() >> >>> [ 1347s] + File >> >>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>> line 434, in _launch >> >>> [ 1347s] + self._post_launch() >> >>> [ 1347s] + File >> >>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", line >> >>> 147, in _post_launch >> >>> [ 1347s] + super()._post_launch() >> >>> [ 1347s] + File >> >>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>> line 340, in _post_launch >> >>> [ 1347s] + self._qmp.accept(self._qmp_timer) >> >>> [ 1347s] + File >> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> >>> line 69, in accept >> >>> [ 1347s] + timeout >> >>> [ 1347s] + File >> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> >>> line 42, in _sync >> >>> [ 1347s] + asyncio.wait_for(future, timeout=timeout) >> >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/base_events.py", line >> >>> 488, in run_until_complete >> >>> [ 1347s] + return future.result() >> >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, >> >>> in wait_for >> >>> [ 1347s] + raise futures.TimeoutError() >> >>> [ 1347s] +concurrent.futures._base.TimeoutError >> >>> >> >>> >> >>> I can see other errors like this, it's the problem of the socket. >> >>> >> >>> [ 1535s] socket_accept failed: Resource temporarily unavailable >> >>> [ 1535s] ** >> >>> [ 1535s] >> >>> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: >> >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) >> >>> >> >>> >> >>> The script is running the command as this: >> >>> /usr/bin/make -O -j4 check-block V=1 >> >>> >> >>> I can see the errors on ppc, arm or x86. >> >>> But I couldn't reproduce it when I run it manually. >> >>> >> >>> It will be appreciated if any suggestions. Thanks. >> >>> >> >>> >> >>> On 3/24/22 14:47, Li Zhang wrote: >> >>>> Hi, >> >>>> >> >>>> When I run the testsuit on our buidling system, it reports a timeout >> >>>> sometimes not always as the following. >> >>>> It couldn't connect qmp socket. Any ideas about this problem? >> >>>> >> >>>> >> >>>> [ 1989s] --- >> >>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out >> >>>> [ 1989s] +++ 040.out.bad >> >>>> [ 1989s] @@ -1,5 +1,55 @@ >> >>>> [ 1989s] >> >>>> -................................................................. >> >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to >> >>>> establish connection: asyncio.exceptions.CancelledError >> >>>> [ 1989s] >> >>>> >> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed >> >> >>>> to establish connection: asyncio.exceptions.CancelledError >> >>>> [ 1989s] +E......................... >> >>>> [ 1989s] >> >>>> >> +====================================================================== >> >>>> [ 1989s] +ERROR: test_commit_node >> (__main__.TestActiveZeroLengthImage) >> >>>> [ 1989s] >> >>>> >> +---------------------------------------------------------------------- >> >>>> [ 1989s] +Traceback (most recent call last): >> >>>> [ 1989s] + File >> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", >> >>>> line 94, in setUp >> >>>> [ 1989s] + self.vm.launch() >> >>>> [ 1989s] + File >> >>>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>>> line 399, in launch >> >>>> [ 1989s] + self._launch() >> >>>> [ 1989s] + File >> >>>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>>> line 434, in _launch >> >>>> [ 1989s] + self._post_launch() >> >>>> [ 1989s] + File >> >>>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", >> >>>> line 147, in _post_launch >> >>>> [ 1989s] + super()._post_launch() >> >>>> [ 1989s] + File >> >>>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>>> line 340, in _post_launch >> >>>> [ 1989s] + self._qmp.accept(self._qmp_timer) >> >>>> [ 1989s] + File >> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> >>>> line 67, in accept >> >>>> [ 1989s] + self._sync( >> >>>> [ 1989s] + File >> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> >>>> line 41, in _sync >> >>>> [ 1989s] + return self._aloop.run_until_complete( >> >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/base_events.py", line >> >>>> 616, in run_until_complete >> >>>> [ 1989s] + return future.result() >> >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, >> >>>> in wait_for >> >>>> [ 1989s] + raise exceptions.TimeoutError() >> >>>> [ 1989s] +asyncio.exceptions.TimeoutError >> >>>> >> >>> >> >>> >> >> >> > >> >> >> [-- Attachment #2: Type: text/html, Size: 15491 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-03-31 14:46 ` John Snow @ 2022-03-31 15:18 ` Li Zhang 2022-04-06 14:48 ` Li Zhang 1 sibling, 0 replies; 11+ messages in thread From: Li Zhang @ 2022-03-31 15:18 UTC (permalink / raw) To: John Snow; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers On 3/31/22 16:46, John Snow wrote: > > > On Thu, Mar 31, 2022, 10:37 AM John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> wrote: > > > > On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de > <mailto:lizhang@suse.de>> wrote: > > On 3/31/22 08:10, Hanna Reitz wrote: > > On 29.03.22 18:49, Li Zhang wrote: > >> Update what I observed. > >> > >> It seems that aqmp is not stable when running test cases. > >> So I revert the patches as the following, iotest40/41 test > cases work > >> well. > > > > Thanks for bisecting. I haven’t seen this problem before, so > I didn’t > > look into it; CC-ing John, perhaps he figures something. > > > Thanks a lot. > It happens occassionally. I didn't find a way to reproduce it > manually. > > > Sorry for the instability. > > A few questions then: > > - what operating system and version > - kernel version > - architecture (looks like x86 and others?) > - what commit/version of QEMU are you testing? > - what python version are you using to run iotests? > > > I'll also point out these patchsets, because we're in different > timezones and if I'm lucky it will be the answer: > > [1] https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/ > <https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/> > > [2] https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/ > <https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/> > > #1 wasn't sufficient to fix the problem, but I had hoped #2 was. The > second patchset was merged somewhat recently, just before rc testing for > 7.0. > That's great, thanks. I consider to backport them and do some testing. > Thanks, > --js > > > > > > > Hanna > > > >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9 > >> > >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>> > >> > >> Date: Tue Oct 26 13:56:12 2021 -0400 > >> > >> > >> > >> python, iotests: replace qmp with aqmp > >> > >> > >> > >> Swap out the synchronous QEMUMonitorProtocol from > qemu.qmp with > >> the sync > >> > >> wrapper from qemu.aqmp instead. > >> > >> > >> > >> Add an escape hatch in the form of the environment variable > >> > >> QEMU_PYTHON_LEGACY_QMP which allows you to cajole > QEMUMachine into > >> using > >> > >> the old implementation, proving that both > implementations work > >> > >> concurrently. > >> > >> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> Reviewed-by: Kevin Wolf <kwolf@redhat.com > <mailto:kwolf@redhat.com>> > >> > >> Reviewed-by: Hanna Reitz <hreitz@redhat.com > <mailto:hreitz@redhat.com>> > >> > >> Message-id: 20211026175612.4127598-9-jsnow@redhat.com > <mailto:20211026175612.4127598-9-jsnow@redhat.com> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> > >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5 > >> > >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>> > >> > >> Date: Thu Nov 18 15:46:18 2021 -0500 > >> > >> > >> > >> python/machine: handle "fast" QEMU terminations > >> > >> > >> > >> In the case that the QEMU process actually launches -- > but then > >> dies so > >> > >> quickly that we can't establish a QMP connection to it > -- QEMUMachine > >> > >> currently calls _post_shutdown() assuming that it never > launched > >> the VM > >> > >> process. > >> > >> > >> > >> This isn't true, though: it "merely" may have failed to > establish > >> a QMP > >> > >> connection and the process is in the middle of its own > exit path. > >> > >> > >> > >> If we don't wait for the subprocess, the caller may get > a bogus > >> `None` > >> > >> return for .exitcode(). This behavior was observed from > >> > >> device-crash-test; after the switch to Async QMP, the > timings were > >> > >> changed such that it was now seemingly possible to > witness the > >> failure > >> > >> of "vm.launch()" *prior* to the exitcode becoming available. > >> > >> > >> > >> The semantic of the `_launched` property is changed in this > >> > >> patch. Instead of representing the condition "launch() > executed > >> > >> successfully", it will now represent "has forked a child > process > >> > >> successfully". This way, wait() when called in the exit > path won't > >> > >> become a no-op. > >> > >> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> Reviewed-by: Willian Rampazzo <willianr@redhat.com > <mailto:willianr@redhat.com>> > >> > >> Message-id: 20211118204620.1897674-6-jsnow@redhat.com > <mailto:20211118204620.1897674-6-jsnow@redhat.com> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> > >> > >> > >> > >> On 3/25/22 11:17, Li Zhang wrote: > >>> Hi, > >>> > >>> I backport some iotests patches to the tree and change timeout. > >>> It doesn't work. > >>> > >>> Sometimes, iotest41 also reports the errors. > >>> [ 1347s] > >>> > +====================================================================== > >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain > (__main__.TestSingleDrive) > >>> [ 1347s] > >>> > +---------------------------------------------------------------------- > >>> [ 1347s] +Traceback (most recent call last): > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>> line 399, in launch > >>> [ 1347s] + self._launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>> line 434, in _launch > >>> [ 1347s] + self._post_launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > line > >>> 147, in _post_launch > >>> [ 1347s] + super()._post_launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>> line 340, in _post_launch > >>> [ 1347s] + self._qmp.accept(self._qmp_timer) > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > > >>> line 69, in accept > >>> [ 1347s] + timeout > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > > >>> line 42, in _sync > >>> [ 1347s] + asyncio.wait_for(future, timeout=timeout) > >>> [ 1347s] + File > "/usr/lib64/python3.6/asyncio/base_events.py", line > >>> 488, in run_until_complete > >>> [ 1347s] + return future.result() > >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", > line 362, > >>> in wait_for > >>> [ 1347s] + raise futures.TimeoutError() > >>> [ 1347s] +concurrent.futures._base.TimeoutError > >>> > >>> > >>> I can see other errors like this, it's the problem of the > socket. > >>> > >>> [ 1535s] socket_accept failed: Resource temporarily unavailable > >>> [ 1535s] ** > >>> [ 1535s] > >>> > ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: > > >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) > >>> > >>> > >>> The script is running the command as this: > >>> /usr/bin/make -O -j4 check-block V=1 > >>> > >>> I can see the errors on ppc, arm or x86. > >>> But I couldn't reproduce it when I run it manually. > >>> > >>> It will be appreciated if any suggestions. Thanks. > >>> > >>> > >>> On 3/24/22 14:47, Li Zhang wrote: > >>>> Hi, > >>>> > >>>> When I run the testsuit on our buidling system, it reports > a timeout > >>>> sometimes not always as the following. > >>>> It couldn't connect qmp socket. Any ideas about this problem? > >>>> > >>>> > >>>> [ 1989s] --- > >>>> > /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out > >>>> [ 1989s] +++ 040.out.bad > >>>> [ 1989s] @@ -1,5 +1,55 @@ > >>>> [ 1989s] > >>>> > -................................................................. > >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to > >>>> establish connection: asyncio.exceptions.CancelledError > >>>> [ 1989s] > >>>> > +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed > > >>>> to establish connection: asyncio.exceptions.CancelledError > >>>> [ 1989s] +E......................... > >>>> [ 1989s] > >>>> > +====================================================================== > >>>> [ 1989s] +ERROR: test_commit_node > (__main__.TestActiveZeroLengthImage) > >>>> [ 1989s] > >>>> > +---------------------------------------------------------------------- > >>>> [ 1989s] +Traceback (most recent call last): > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", > >>>> line 94, in setUp > >>>> [ 1989s] + self.vm.launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>>> line 399, in launch > >>>> [ 1989s] + self._launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>>> line 434, in _launch > >>>> [ 1989s] + self._post_launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > > >>>> line 147, in _post_launch > >>>> [ 1989s] + super()._post_launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>>> line 340, in _post_launch > >>>> [ 1989s] + self._qmp.accept(self._qmp_timer) > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > > >>>> line 67, in accept > >>>> [ 1989s] + self._sync( > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > > >>>> line 41, in _sync > >>>> [ 1989s] + return self._aloop.run_until_complete( > >>>> [ 1989s] + File > "/usr/lib64/python3.8/asyncio/base_events.py", line > >>>> 616, in run_until_complete > >>>> [ 1989s] + return future.result() > >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", > line 501, > >>>> in wait_for > >>>> [ 1989s] + raise exceptions.TimeoutError() > >>>> [ 1989s] +asyncio.exceptions.TimeoutError > >>>> > >>> > >>> > >> > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-03-31 14:46 ` John Snow 2022-03-31 15:18 ` Li Zhang @ 2022-04-06 14:48 ` Li Zhang 2022-04-06 15:11 ` John Snow 1 sibling, 1 reply; 11+ messages in thread From: Li Zhang @ 2022-04-06 14:48 UTC (permalink / raw) To: John Snow; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers On 3/31/22 16:46, John Snow wrote: > > > On Thu, Mar 31, 2022, 10:37 AM John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> wrote: > > > > On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de > <mailto:lizhang@suse.de>> wrote: > > On 3/31/22 08:10, Hanna Reitz wrote: > > On 29.03.22 18:49, Li Zhang wrote: > >> Update what I observed. > >> > >> It seems that aqmp is not stable when running test cases. > >> So I revert the patches as the following, iotest40/41 test > cases work > >> well. > > > > Thanks for bisecting. I haven’t seen this problem before, so > I didn’t > > look into it; CC-ing John, perhaps he figures something. > > > Thanks a lot. > It happens occassionally. I didn't find a way to reproduce it > manually. > > > Sorry for the instability. > > A few questions then: > > - what operating system and version > - kernel version > - architecture (looks like x86 and others?) > - what commit/version of QEMU are you testing? > - what python version are you using to run iotests? > > > I'll also point out these patchsets, because we're in different > timezones and if I'm lucky it will be the answer: > > [1] https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/ > <https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/> > > [2] https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/ > <https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/> Thanks a lot. I backport aqmp related patches, it looks good to run the cases iotest040/041. > > #1 wasn't sufficient to fix the problem, but I had hoped #2 was. The > second patchset was merged somewhat recently, just before rc testing for > 7.0. > > Thanks, > --js > > > > > > > Hanna > > > >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9 > >> > >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>> > >> > >> Date: Tue Oct 26 13:56:12 2021 -0400 > >> > >> > >> > >> python, iotests: replace qmp with aqmp > >> > >> > >> > >> Swap out the synchronous QEMUMonitorProtocol from > qemu.qmp with > >> the sync > >> > >> wrapper from qemu.aqmp instead. > >> > >> > >> > >> Add an escape hatch in the form of the environment variable > >> > >> QEMU_PYTHON_LEGACY_QMP which allows you to cajole > QEMUMachine into > >> using > >> > >> the old implementation, proving that both > implementations work > >> > >> concurrently. > >> > >> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> Reviewed-by: Kevin Wolf <kwolf@redhat.com > <mailto:kwolf@redhat.com>> > >> > >> Reviewed-by: Hanna Reitz <hreitz@redhat.com > <mailto:hreitz@redhat.com>> > >> > >> Message-id: 20211026175612.4127598-9-jsnow@redhat.com > <mailto:20211026175612.4127598-9-jsnow@redhat.com> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> > >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5 > >> > >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>> > >> > >> Date: Thu Nov 18 15:46:18 2021 -0500 > >> > >> > >> > >> python/machine: handle "fast" QEMU terminations > >> > >> > >> > >> In the case that the QEMU process actually launches -- > but then > >> dies so > >> > >> quickly that we can't establish a QMP connection to it > -- QEMUMachine > >> > >> currently calls _post_shutdown() assuming that it never > launched > >> the VM > >> > >> process. > >> > >> > >> > >> This isn't true, though: it "merely" may have failed to > establish > >> a QMP > >> > >> connection and the process is in the middle of its own > exit path. > >> > >> > >> > >> If we don't wait for the subprocess, the caller may get > a bogus > >> `None` > >> > >> return for .exitcode(). This behavior was observed from > >> > >> device-crash-test; after the switch to Async QMP, the > timings were > >> > >> changed such that it was now seemingly possible to > witness the > >> failure > >> > >> of "vm.launch()" *prior* to the exitcode becoming available. > >> > >> > >> > >> The semantic of the `_launched` property is changed in this > >> > >> patch. Instead of representing the condition "launch() > executed > >> > >> successfully", it will now represent "has forked a child > process > >> > >> successfully". This way, wait() when called in the exit > path won't > >> > >> become a no-op. > >> > >> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> Reviewed-by: Willian Rampazzo <willianr@redhat.com > <mailto:willianr@redhat.com>> > >> > >> Message-id: 20211118204620.1897674-6-jsnow@redhat.com > <mailto:20211118204620.1897674-6-jsnow@redhat.com> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> > >> > >> > >> > >> On 3/25/22 11:17, Li Zhang wrote: > >>> Hi, > >>> > >>> I backport some iotests patches to the tree and change timeout. > >>> It doesn't work. > >>> > >>> Sometimes, iotest41 also reports the errors. > >>> [ 1347s] > >>> > +====================================================================== > >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain > (__main__.TestSingleDrive) > >>> [ 1347s] > >>> > +---------------------------------------------------------------------- > >>> [ 1347s] +Traceback (most recent call last): > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>> line 399, in launch > >>> [ 1347s] + self._launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>> line 434, in _launch > >>> [ 1347s] + self._post_launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > line > >>> 147, in _post_launch > >>> [ 1347s] + super()._post_launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>> line 340, in _post_launch > >>> [ 1347s] + self._qmp.accept(self._qmp_timer) > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > > >>> line 69, in accept > >>> [ 1347s] + timeout > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > > >>> line 42, in _sync > >>> [ 1347s] + asyncio.wait_for(future, timeout=timeout) > >>> [ 1347s] + File > "/usr/lib64/python3.6/asyncio/base_events.py", line > >>> 488, in run_until_complete > >>> [ 1347s] + return future.result() > >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", > line 362, > >>> in wait_for > >>> [ 1347s] + raise futures.TimeoutError() > >>> [ 1347s] +concurrent.futures._base.TimeoutError > >>> > >>> > >>> I can see other errors like this, it's the problem of the > socket. > >>> > >>> [ 1535s] socket_accept failed: Resource temporarily unavailable > >>> [ 1535s] ** > >>> [ 1535s] > >>> > ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: > > >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) > >>> > >>> > >>> The script is running the command as this: > >>> /usr/bin/make -O -j4 check-block V=1 > >>> > >>> I can see the errors on ppc, arm or x86. > >>> But I couldn't reproduce it when I run it manually. > >>> > >>> It will be appreciated if any suggestions. Thanks. > >>> > >>> > >>> On 3/24/22 14:47, Li Zhang wrote: > >>>> Hi, > >>>> > >>>> When I run the testsuit on our buidling system, it reports > a timeout > >>>> sometimes not always as the following. > >>>> It couldn't connect qmp socket. Any ideas about this problem? > >>>> > >>>> > >>>> [ 1989s] --- > >>>> > /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out > >>>> [ 1989s] +++ 040.out.bad > >>>> [ 1989s] @@ -1,5 +1,55 @@ > >>>> [ 1989s] > >>>> > -................................................................. > >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to > >>>> establish connection: asyncio.exceptions.CancelledError > >>>> [ 1989s] > >>>> > +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed > > >>>> to establish connection: asyncio.exceptions.CancelledError > >>>> [ 1989s] +E......................... > >>>> [ 1989s] > >>>> > +====================================================================== > >>>> [ 1989s] +ERROR: test_commit_node > (__main__.TestActiveZeroLengthImage) > >>>> [ 1989s] > >>>> > +---------------------------------------------------------------------- > >>>> [ 1989s] +Traceback (most recent call last): > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", > >>>> line 94, in setUp > >>>> [ 1989s] + self.vm.launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>>> line 399, in launch > >>>> [ 1989s] + self._launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>>> line 434, in _launch > >>>> [ 1989s] + self._post_launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > > >>>> line 147, in _post_launch > >>>> [ 1989s] + super()._post_launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>>> line 340, in _post_launch > >>>> [ 1989s] + self._qmp.accept(self._qmp_timer) > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > > >>>> line 67, in accept > >>>> [ 1989s] + self._sync( > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > > >>>> line 41, in _sync > >>>> [ 1989s] + return self._aloop.run_until_complete( > >>>> [ 1989s] + File > "/usr/lib64/python3.8/asyncio/base_events.py", line > >>>> 616, in run_until_complete > >>>> [ 1989s] + return future.result() > >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", > line 501, > >>>> in wait_for > >>>> [ 1989s] + raise exceptions.TimeoutError() > >>>> [ 1989s] +asyncio.exceptions.TimeoutError > >>>> > >>> > >>> > >> > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-04-06 14:48 ` Li Zhang @ 2022-04-06 15:11 ` John Snow 0 siblings, 0 replies; 11+ messages in thread From: John Snow @ 2022-04-06 15:11 UTC (permalink / raw) To: Li Zhang; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers [-- Attachment #1: Type: text/plain, Size: 1964 bytes --] On Wed, Apr 6, 2022, 10:53 AM Li Zhang <lizhang@suse.de> wrote: > On 3/31/22 16:46, John Snow wrote: > > > > > > On Thu, Mar 31, 2022, 10:37 AM John Snow <jsnow@redhat.com > > <mailto:jsnow@redhat.com>> wrote: > > > > > > > > On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de > > <mailto:lizhang@suse.de>> wrote: > > > > On 3/31/22 08:10, Hanna Reitz wrote: > > > On 29.03.22 18:49, Li Zhang wrote: > > >> Update what I observed. > > >> > > >> It seems that aqmp is not stable when running test cases. > > >> So I revert the patches as the following, iotest40/41 test > > cases work > > >> well. > > > > > > Thanks for bisecting. I haven’t seen this problem before, so > > I didn’t > > > look into it; CC-ing John, perhaps he figures something. > > > > > Thanks a lot. > > It happens occassionally. I didn't find a way to reproduce it > > manually. > > > > > > Sorry for the instability. > > > > A few questions then: > > > > - what operating system and version > > - kernel version > > - architecture (looks like x86 and others?) > > - what commit/version of QEMU are you testing? > > - what python version are you using to run iotests? > > > > > > I'll also point out these patchsets, because we're in different > > timezones and if I'm lucky it will be the answer: > > > > [1] https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/ > > <https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/> > > > > [2] https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/ > > <https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/> > > Thanks a lot. > > I backport aqmp related patches, it looks good to run the cases > iotest040/041. > Excellent, thanks for reporting back! [-- Attachment #2: Type: text/html, Size: 3534 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: iotest40 problem 2022-03-31 14:37 ` John Snow 2022-03-31 14:46 ` John Snow @ 2022-03-31 15:16 ` Li Zhang 1 sibling, 0 replies; 11+ messages in thread From: Li Zhang @ 2022-03-31 15:16 UTC (permalink / raw) To: John Snow; +Cc: Kevin Wolf, Hanna Reitz, QEMU Developers On 3/31/22 16:37, John Snow wrote: > > > On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de > <mailto:lizhang@suse.de>> wrote: > > On 3/31/22 08:10, Hanna Reitz wrote: > > On 29.03.22 18:49, Li Zhang wrote: > >> Update what I observed. > >> > >> It seems that aqmp is not stable when running test cases. > >> So I revert the patches as the following, iotest40/41 test cases > work > >> well. > > > > Thanks for bisecting. I haven’t seen this problem before, so I > didn’t > > look into it; CC-ing John, perhaps he figures something. > > > Thanks a lot. > It happens occassionally. I didn't find a way to reproduce it manually. > > > Sorry for the instability. > > A few questions then: > > - what operating system and version > - kernel version > - architecture (looks like x86 and others?) > - what commit/version of QEMU are you testing? > - what python version are you using to run iotests? > Hi John, Thanks for your reply, here is the answer. - openSUSE15.3, openSUSE15.4 - Kernel version: openSUSE15.3: kernel-obs-build-5.3.18-150300.59.60.4 openSUSE15.4: kernel-obs-build-5.14.21-150400.15.2 - architecture: X86, aarch64, s390 - QEMU version is based on qemu6.2.0, I also backport iotest patches after QEMU6.2.0 - python: openSUSE 15.3: python3-base-3.6.15-150300.10.21.1 openSuse15.4: python3-base-3.6.15-150300.10.21.1 This issue happens on S390, aarch64 a lot. > > > > Hanna > > > >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9 > >> > >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>> > >> > >> Date: Tue Oct 26 13:56:12 2021 -0400 > >> > >> > >> > >> python, iotests: replace qmp with aqmp > >> > >> > >> > >> Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with > >> the sync > >> > >> wrapper from qemu.aqmp instead. > >> > >> > >> > >> Add an escape hatch in the form of the environment variable > >> > >> QEMU_PYTHON_LEGACY_QMP which allows you to cajole > QEMUMachine into > >> using > >> > >> the old implementation, proving that both implementations work > >> > >> concurrently. > >> > >> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> Reviewed-by: Kevin Wolf <kwolf@redhat.com > <mailto:kwolf@redhat.com>> > >> > >> Reviewed-by: Hanna Reitz <hreitz@redhat.com > <mailto:hreitz@redhat.com>> > >> > >> Message-id: 20211026175612.4127598-9-jsnow@redhat.com > <mailto:20211026175612.4127598-9-jsnow@redhat.com> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> > >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5 > >> > >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>> > >> > >> Date: Thu Nov 18 15:46:18 2021 -0500 > >> > >> > >> > >> python/machine: handle "fast" QEMU terminations > >> > >> > >> > >> In the case that the QEMU process actually launches -- but then > >> dies so > >> > >> quickly that we can't establish a QMP connection to it -- > QEMUMachine > >> > >> currently calls _post_shutdown() assuming that it never > launched > >> the VM > >> > >> process. > >> > >> > >> > >> This isn't true, though: it "merely" may have failed to > establish > >> a QMP > >> > >> connection and the process is in the middle of its own exit > path. > >> > >> > >> > >> If we don't wait for the subprocess, the caller may get a bogus > >> `None` > >> > >> return for .exitcode(). This behavior was observed from > >> > >> device-crash-test; after the switch to Async QMP, the > timings were > >> > >> changed such that it was now seemingly possible to witness the > >> failure > >> > >> of "vm.launch()" *prior* to the exitcode becoming available. > >> > >> > >> > >> The semantic of the `_launched` property is changed in this > >> > >> patch. Instead of representing the condition "launch() executed > >> > >> successfully", it will now represent "has forked a child process > >> > >> successfully". This way, wait() when called in the exit path > won't > >> > >> become a no-op. > >> > >> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> Reviewed-by: Willian Rampazzo <willianr@redhat.com > <mailto:willianr@redhat.com>> > >> > >> Message-id: 20211118204620.1897674-6-jsnow@redhat.com > <mailto:20211118204620.1897674-6-jsnow@redhat.com> > >> > >> Signed-off-by: John Snow <jsnow@redhat.com > <mailto:jsnow@redhat.com>> > >> > >> > >> > >> > >> > >> On 3/25/22 11:17, Li Zhang wrote: > >>> Hi, > >>> > >>> I backport some iotests patches to the tree and change timeout. > >>> It doesn't work. > >>> > >>> Sometimes, iotest41 also reports the errors. > >>> [ 1347s] > >>> > +====================================================================== > >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain > (__main__.TestSingleDrive) > >>> [ 1347s] > >>> > +---------------------------------------------------------------------- > >>> [ 1347s] +Traceback (most recent call last): > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>> line 399, in launch > >>> [ 1347s] + self._launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>> line 434, in _launch > >>> [ 1347s] + self._post_launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > line > >>> 147, in _post_launch > >>> [ 1347s] + super()._post_launch() > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>> line 340, in _post_launch > >>> [ 1347s] + self._qmp.accept(self._qmp_timer) > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > >>> line 69, in accept > >>> [ 1347s] + timeout > >>> [ 1347s] + File > >>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > >>> line 42, in _sync > >>> [ 1347s] + asyncio.wait_for(future, timeout=timeout) > >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/base_events.py", > line > >>> 488, in run_until_complete > >>> [ 1347s] + return future.result() > >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", line > 362, > >>> in wait_for > >>> [ 1347s] + raise futures.TimeoutError() > >>> [ 1347s] +concurrent.futures._base.TimeoutError > >>> > >>> > >>> I can see other errors like this, it's the problem of the socket. > >>> > >>> [ 1535s] socket_accept failed: Resource temporarily unavailable > >>> [ 1535s] ** > >>> [ 1535s] > >>> > ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: > >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) > >>> > >>> > >>> The script is running the command as this: > >>> /usr/bin/make -O -j4 check-block V=1 > >>> > >>> I can see the errors on ppc, arm or x86. > >>> But I couldn't reproduce it when I run it manually. > >>> > >>> It will be appreciated if any suggestions. Thanks. > >>> > >>> > >>> On 3/24/22 14:47, Li Zhang wrote: > >>>> Hi, > >>>> > >>>> When I run the testsuit on our buidling system, it reports a > timeout > >>>> sometimes not always as the following. > >>>> It couldn't connect qmp socket. Any ideas about this problem? > >>>> > >>>> > >>>> [ 1989s] --- > >>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out > >>>> [ 1989s] +++ 040.out.bad > >>>> [ 1989s] @@ -1,5 +1,55 @@ > >>>> [ 1989s] > >>>> -................................................................. > >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to > >>>> establish connection: asyncio.exceptions.CancelledError > >>>> [ 1989s] > >>>> > +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed > > >>>> to establish connection: asyncio.exceptions.CancelledError > >>>> [ 1989s] +E......................... > >>>> [ 1989s] > >>>> > +====================================================================== > >>>> [ 1989s] +ERROR: test_commit_node > (__main__.TestActiveZeroLengthImage) > >>>> [ 1989s] > >>>> > +---------------------------------------------------------------------- > >>>> [ 1989s] +Traceback (most recent call last): > >>>> [ 1989s] + File > >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", > >>>> line 94, in setUp > >>>> [ 1989s] + self.vm.launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>>> line 399, in launch > >>>> [ 1989s] + self._launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>>> line 434, in _launch > >>>> [ 1989s] + self._post_launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", > >>>> line 147, in _post_launch > >>>> [ 1989s] + super()._post_launch() > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", > > >>>> line 340, in _post_launch > >>>> [ 1989s] + self._qmp.accept(self._qmp_timer) > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > >>>> line 67, in accept > >>>> [ 1989s] + self._sync( > >>>> [ 1989s] + File > >>>> > "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", > >>>> line 41, in _sync > >>>> [ 1989s] + return self._aloop.run_until_complete( > >>>> [ 1989s] + File > "/usr/lib64/python3.8/asyncio/base_events.py", line > >>>> 616, in run_until_complete > >>>> [ 1989s] + return future.result() > >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", line > 501, > >>>> in wait_for > >>>> [ 1989s] + raise exceptions.TimeoutError() > >>>> [ 1989s] +asyncio.exceptions.TimeoutError > >>>> > >>> > >>> > >> > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-04-06 15:13 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-03-24 13:47 iotest40 problem Li Zhang 2022-03-25 10:17 ` Li Zhang 2022-03-29 16:49 ` Li Zhang 2022-03-31 6:10 ` Hanna Reitz 2022-03-31 10:44 ` Li Zhang 2022-03-31 14:37 ` John Snow 2022-03-31 14:46 ` John Snow 2022-03-31 15:18 ` Li Zhang 2022-04-06 14:48 ` Li Zhang 2022-04-06 15:11 ` John Snow 2022-03-31 15:16 ` Li Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).