qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Li Zhang <lizhang@suse.de>
To: John Snow <jsnow@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>,
	QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: iotest40 problem
Date: Thu, 31 Mar 2022 17:18:20 +0200	[thread overview]
Message-ID: <ad57ce9f-c13e-cbb7-9abd-61c67fb33e11@suse.de> (raw)
In-Reply-To: <CAFn=p-bvbpP=tMFDYiJaEhiAtETYnx-PAqCDGaPZ-wVU5ej7LQ@mail.gmail.com>

On 3/31/22 16:46, John Snow wrote:
> 
> 
> On Thu, Mar 31, 2022, 10:37 AM John Snow <jsnow@redhat.com 
> <mailto:jsnow@redhat.com>> wrote:
> 
> 
> 
>     On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizhang@suse.de
>     <mailto:lizhang@suse.de>> wrote:
> 
>         On 3/31/22 08:10, Hanna Reitz wrote:
>          > On 29.03.22 18:49, Li Zhang wrote:
>          >> Update what I observed.
>          >>
>          >> It seems that aqmp is not stable when running test cases.
>          >> So I revert the patches as the following, iotest40/41 test
>         cases work
>          >> well.
>          >
>          > Thanks for bisecting.  I haven’t seen this problem before, so
>         I didn’t
>          > look into it; CC-ing John, perhaps he figures something.
>          >
>         Thanks a lot.
>         It happens occassionally. I didn't find a way to reproduce it
>         manually.
> 
> 
>     Sorry for the instability.
> 
>     A few questions then:
> 
>     - what operating system and version
>     - kernel version
>     - architecture (looks like x86 and others?)
>     - what commit/version of QEMU are you testing?
>     - what python version are you using to run iotests?
> 
> 
> I'll also point out these patchsets, because we're in different 
> timezones and if I'm lucky it will be the answer:
> 
> [1] https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/ 
> <https://patchew.org/QEMU/20220201041134.1237016-1-jsnow@redhat.com/>
> 
> [2] https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/ 
> <https://patchew.org/QEMU/20220225205948.3693480-1-jsnow@redhat.com/>
> 
> #1 wasn't sufficient to fix the problem, but I had hoped #2 was. The 
> second patchset was merged somewhat recently, just before rc testing for 
> 7.0.
> 

That's great, thanks. I consider to backport them and do some testing.


> Thanks,
> --js
> 
> 
> 
> 
> 
>          > Hanna
>          >
>          >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
>          >>
>          >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>>
>          >>
>          >> Date:   Tue Oct 26 13:56:12 2021 -0400
>          >>
>          >>
>          >>
>          >>     python, iotests: replace qmp with aqmp
>          >>
>          >>
>          >>
>          >>     Swap out the synchronous QEMUMonitorProtocol from
>         qemu.qmp with
>          >> the sync
>          >>
>          >>     wrapper from qemu.aqmp instead.
>          >>
>          >>
>          >>
>          >>     Add an escape hatch in the form of the environment variable
>          >>
>          >>     QEMU_PYTHON_LEGACY_QMP which allows you to cajole
>         QEMUMachine into
>          >> using
>          >>
>          >>     the old implementation, proving that both
>         implementations work
>          >>
>          >>     concurrently.
>          >>
>          >>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>     Reviewed-by: Kevin Wolf <kwolf@redhat.com
>         <mailto:kwolf@redhat.com>>
>          >>
>          >>     Reviewed-by: Hanna Reitz <hreitz@redhat.com
>         <mailto:hreitz@redhat.com>>
>          >>
>          >>     Message-id: 20211026175612.4127598-9-jsnow@redhat.com
>         <mailto:20211026175612.4127598-9-jsnow@redhat.com>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>
>          >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
>          >>
>          >> Author: John Snow <jsnow@redhat.com <mailto:jsnow@redhat.com>>
>          >>
>          >> Date:   Thu Nov 18 15:46:18 2021 -0500
>          >>
>          >>
>          >>
>          >>     python/machine: handle "fast" QEMU terminations
>          >>
>          >>
>          >>
>          >>     In the case that the QEMU process actually launches --
>         but then
>          >> dies so
>          >>
>          >>     quickly that we can't establish a QMP connection to it
>         -- QEMUMachine
>          >>
>          >>     currently calls _post_shutdown() assuming that it never
>         launched
>          >> the VM
>          >>
>          >>     process.
>          >>
>          >>
>          >>
>          >>     This isn't true, though: it "merely" may have failed to
>         establish
>          >> a QMP
>          >>
>          >>     connection and the process is in the middle of its own
>         exit path.
>          >>
>          >>
>          >>
>          >>     If we don't wait for the subprocess, the caller may get
>         a bogus
>          >> `None`
>          >>
>          >>     return for .exitcode(). This behavior was observed from
>          >>
>          >>     device-crash-test; after the switch to Async QMP, the
>         timings were
>          >>
>          >>     changed such that it was now seemingly possible to
>         witness the
>          >> failure
>          >>
>          >>     of "vm.launch()" *prior* to the exitcode becoming available.
>          >>
>          >>
>          >>
>          >>     The semantic of the `_launched` property is changed in this
>          >>
>          >>     patch. Instead of representing the condition "launch()
>         executed
>          >>
>          >>     successfully", it will now represent "has forked a child
>         process
>          >>
>          >>     successfully". This way, wait() when called in the exit
>         path won't
>          >>
>          >>     become a no-op.
>          >>
>          >>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>     Reviewed-by: Willian Rampazzo <willianr@redhat.com
>         <mailto:willianr@redhat.com>>
>          >>
>          >>     Message-id: 20211118204620.1897674-6-jsnow@redhat.com
>         <mailto:20211118204620.1897674-6-jsnow@redhat.com>
>          >>
>          >>     Signed-off-by: John Snow <jsnow@redhat.com
>         <mailto:jsnow@redhat.com>>
>          >>
>          >>
>          >>
>          >>
>          >>
>          >> On 3/25/22 11:17, Li Zhang wrote:
>          >>> Hi,
>          >>>
>          >>> I backport some iotests patches to the tree and change timeout.
>          >>> It doesn't work.
>          >>>
>          >>> Sometimes, iotest41 also reports the errors.
>          >>> [ 1347s]
>          >>>
>         +======================================================================
>          >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain
>         (__main__.TestSingleDrive)
>          >>> [ 1347s]
>          >>>
>         +----------------------------------------------------------------------
>          >>> [ 1347s] +Traceback (most recent call last):
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>> line 399, in launch
>          >>> [ 1347s] +    self._launch()
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>> line 434, in _launch
>          >>> [ 1347s] +    self._post_launch()
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
>         line
>          >>> 147, in _post_launch
>          >>> [ 1347s] +    super()._post_launch()
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>> line 340, in _post_launch
>          >>> [ 1347s] +    self._qmp.accept(self._qmp_timer)
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>> line 69, in accept
>          >>> [ 1347s] +    timeout
>          >>> [ 1347s] +  File
>          >>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>> line 42, in _sync
>          >>> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
>          >>> [ 1347s] +  File
>         "/usr/lib64/python3.6/asyncio/base_events.py", line
>          >>> 488, in run_until_complete
>          >>> [ 1347s] +    return future.result()
>          >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py",
>         line 362,
>          >>> in wait_for
>          >>> [ 1347s] +    raise futures.TimeoutError()
>          >>> [ 1347s] +concurrent.futures._base.TimeoutError
>          >>>
>          >>>
>          >>> I can see other errors like this, it's the problem of the
>         socket.
>          >>>
>          >>> [ 1535s] socket_accept failed: Resource temporarily unavailable
>          >>> [ 1535s] **
>          >>> [ 1535s]
>          >>>
>         ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake:
> 
>          >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
>          >>>
>          >>>
>          >>> The script is running the command as this:
>          >>> /usr/bin/make -O -j4 check-block V=1
>          >>>
>          >>> I can see the errors on ppc, arm or x86.
>          >>> But I couldn't reproduce it when I run it manually.
>          >>>
>          >>> It will be appreciated if any suggestions. Thanks.
>          >>>
>          >>>
>          >>> On 3/24/22 14:47, Li Zhang wrote:
>          >>>> Hi,
>          >>>>
>          >>>> When I run the testsuit on our buidling system, it reports
>         a timeout
>          >>>> sometimes not always as the following.
>          >>>> It couldn't connect qmp socket. Any ideas about this problem?
>          >>>>
>          >>>>
>          >>>> [ 1989s] ---
>          >>>>
>         /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>          >>>> [ 1989s] +++ 040.out.bad
>          >>>> [ 1989s] @@ -1,5 +1,55 @@
>          >>>> [ 1989s]
>          >>>>
>         -.................................................................
>          >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to
>          >>>> establish connection: asyncio.exceptions.CancelledError
>          >>>> [ 1989s]
>          >>>>
>         +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed
> 
>          >>>> to establish connection: asyncio.exceptions.CancelledError
>          >>>> [ 1989s] +E.........................
>          >>>> [ 1989s]
>          >>>>
>         +======================================================================
>          >>>> [ 1989s] +ERROR: test_commit_node
>         (__main__.TestActiveZeroLengthImage)
>          >>>> [ 1989s]
>          >>>>
>         +----------------------------------------------------------------------
>          >>>> [ 1989s] +Traceback (most recent call last):
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040",
>          >>>> line 94, in setUp
>          >>>> [ 1989s] +    self.vm.launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>>> line 399, in launch
>          >>>> [ 1989s] +    self._launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>>> line 434, in _launch
>          >>>> [ 1989s] +    self._post_launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
> 
>          >>>> line 147, in _post_launch
>          >>>> [ 1989s] +    super()._post_launch()
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
> 
>          >>>> line 340, in _post_launch
>          >>>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>>> line 67, in accept
>          >>>> [ 1989s] +    self._sync(
>          >>>> [ 1989s] +  File
>          >>>>
>         "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
> 
>          >>>> line 41, in _sync
>          >>>> [ 1989s] +    return self._aloop.run_until_complete(
>          >>>> [ 1989s] +  File
>         "/usr/lib64/python3.8/asyncio/base_events.py", line
>          >>>> 616, in run_until_complete
>          >>>> [ 1989s] +    return future.result()
>          >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py",
>         line 501,
>          >>>> in wait_for
>          >>>> [ 1989s] +    raise exceptions.TimeoutError()
>          >>>> [ 1989s] +asyncio.exceptions.TimeoutError
>          >>>>
>          >>>
>          >>>
>          >>
>          >
> 
> 



  reply	other threads:[~2022-03-31 15:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 13:47 iotest40 problem Li Zhang
2022-03-25 10:17 ` Li Zhang
2022-03-29 16:49   ` Li Zhang
2022-03-31  6:10     ` Hanna Reitz
2022-03-31 10:44       ` Li Zhang
2022-03-31 14:37         ` John Snow
2022-03-31 14:46           ` John Snow
2022-03-31 15:18             ` Li Zhang [this message]
2022-04-06 14:48             ` Li Zhang
2022-04-06 15:11               ` John Snow
2022-03-31 15:16           ` Li Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad57ce9f-c13e-cbb7-9abd-61c67fb33e11@suse.de \
    --to=lizhang@suse.de \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).