* Re: 'make check-acceptance' failing on s390 tests?
2022-02-18 15:04 'make check-acceptance' failing on s390 tests? Peter Maydell
@ 2022-02-18 23:17 ` Richard Henderson
2022-02-21 15:27 ` Thomas Huth
2022-03-11 17:52 ` Thomas Huth
2 siblings, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2022-02-18 23:17 UTC (permalink / raw)
To: Peter Maydell, QEMU Developers
Cc: Thomas Huth, Daniel P. Berrange, Beraldo Leal, Cornelia Huck,
Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
qemu-s390x, Cleber Rosa, Alex Bennée
On 2/19/22 02:04, Peter Maydell wrote:
> Hi; is anybody else seeing 'make check-acceptance' fail on some of
> the s390 tests?
>
> (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
> 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
> (900.20 s)
>
>
> (090/183) tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora:
> FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s)
FWIW, yes, I'm seeing those.
r~
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 'make check-acceptance' failing on s390 tests?
2022-02-18 15:04 'make check-acceptance' failing on s390 tests? Peter Maydell
2022-02-18 23:17 ` Richard Henderson
@ 2022-02-21 15:27 ` Thomas Huth
2022-03-11 17:52 ` Thomas Huth
2 siblings, 0 replies; 4+ messages in thread
From: Thomas Huth @ 2022-02-21 15:27 UTC (permalink / raw)
To: Peter Maydell, QEMU Developers
Cc: Daniel P. Berrange, Beraldo Leal, Cornelia Huck,
Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
qemu-s390x, Cleber Rosa, Alex Bennée
On 18/02/2022 16.04, Peter Maydell wrote:
> Hi; is anybody else seeing 'make check-acceptance' fail on some of
> the s390 tests?
>
> (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
> 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
> (900.20 s)
>
>
> (090/183) tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora:
> FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s)
>
>
> I've cc'd Daniel because the 090 at least looks like a resolution
> baked into the test case, and commit de72c4b7c that went in
> last month changed the EDID reported resolution from 1024x768
> to 1280x800.
Yes, that seems to be right - since the default monitor resolution changed,
the screenshot now has a different size, too. I sent a patch here:
https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg04473.html
> Not sure about the timeout on the boot test: the avocado log
> shows it booting at least as far as
> "Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)"
> and then there's no further output until the timeout.
> Unfortunately the avocado log doesn't seem to include useful
> information like "this is the string we were waiting to see", so
> I'm not sure exactly what's gone wrong there.
>
> (I continue to find the Avocado tests rather opaque: when you
> get a series of green OK's that's fine, but when you get a failure
> it's often non-obvious why it failed or how to do simple things
> like "rerun just that one failed test" or "run the failing command,
> interactively on the command line".)
For me, it's even worse with the tests/avocado/boot_linux.py - none of them
is working on my local laptop, so I was always ignoring them until now.
FWIW, I'm seeing this python backtrace in the log:
Reproduced traceback from:
/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/core/test.py:770
Traceback (most recent call last):
File "/home/thuth/tmp/qemu-build/tests/avocado/boot_linux.py", line 30,
in test_pc_i440fx_tcg
self.launch_and_wait(set_up_ssh_connection=False)
File
"/home/thuth/tmp/qemu-build/tests/avocado/avocado_qemu/__init__.py", line
636, in launch_and_wait
cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port),
self.name)
File
"/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/utils/cloudinit.py",
line 192, in wait_for_phone_home
s = PhoneHomeServer(address, instance_id)
File
"/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/utils/cloudinit.py",
line 173, in __init__
HTTPServer.__init__(self, address, PhoneHomeServerHandler)
File "/usr/lib64/python3.6/socketserver.py", line 456, in __init__
self.server_bind()
File "/usr/lib64/python3.6/http/server.py", line 136, in server_bind
socketserver.TCPServer.server_bind(self)
File "/usr/lib64/python3.6/socketserver.py", line 470, in server_bind
self.socket.bind(self.server_address)
TypeError: an integer is required (got type NoneType)
... no clue how to debug these problems, though.
> The 090 failure didn't cause the merge to be rejected because
> in commit 333168efe5c8 we disabled both these tests when
> running on GitLab.
>
> Suggestion: we should either disable tests entirely (except
> for manual "I want to run this known-flaky test") or not at
> all, rather than disabling them only on GitLab. If I'm running
> 'make check-acceptance' locally I don't want to be distracted
> by tests we know to be dodgy, any more than if I were running
> the CI on GitLab.
IIRC I only saw the occasional hangs of the test on Gitlab, and never on my
local host ... but I see your point ... I'm fine if we replace the
@skipIf(os.getenv('GITLAB_CI')...) there with a
@skipUnless(os.getenv('AVOCADO_ALLOW_FLAKY_TESTS')...) or something similar.
Would you have some spare time to write such a patch?
Thomas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 'make check-acceptance' failing on s390 tests?
2022-02-18 15:04 'make check-acceptance' failing on s390 tests? Peter Maydell
2022-02-18 23:17 ` Richard Henderson
2022-02-21 15:27 ` Thomas Huth
@ 2022-03-11 17:52 ` Thomas Huth
2 siblings, 0 replies; 4+ messages in thread
From: Thomas Huth @ 2022-03-11 17:52 UTC (permalink / raw)
To: Peter Maydell, QEMU Developers, Richard Henderson
Cc: Daniel P. Berrange, Beraldo Leal, Cornelia Huck,
Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
qemu-s390x, Cleber Rosa, Alex Bennée
On 18/02/2022 16.04, Peter Maydell wrote:
> Hi; is anybody else seeing 'make check-acceptance' fail on some of
> the s390 tests?
>
> (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
> 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
> (900.20 s)
[...]
> Not sure about the timeout on the boot test: the avocado log
> shows it booting at least as far as
> "Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)"
> and then there's no further output until the timeout.
Now that I've finally been able to run the test again (after
manually tweaking that borked is_port_free() function in
avocado), I've had a closer look at the failing BootLinuxS390X
test: If you're looking at the output of the guest in the log,
you can see that it fails to init the cloud-init stuff and
thus fails to "phone home" at the end.
This used to work fine in older versions, so I just spent a
lot of time bisecting this issue and ended up here:
f83bcecb1ffe25a18367409eaf4ba1453c835c48 is the first bad commit
commit f83bcecb1ffe25a18367409eaf4ba1453c835c48
Author: Richard Henderson <richard.henderson@linaro.org>
Date: Tue Jul 27 07:48:55 2021 -1000
accel/tcg: Add cpu_{ld,st}*_mmu interfaces
Richard, could you please have a look at this one, too? ... it
causes the test to fail:
$ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48~1
$ ./configure --target-list=s390x-softmmu --disable-docs
$ make -j8
$ make check-venv
$ cd build
$ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X
JOB ID : 0a6d287620d150d52c24417d0a672a1a826b3a82
JOB LOG : /home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log
(1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: PASS (130.38 s)
RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME : 136.51 s
$ grep cloud-ini /home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log
...
2022-03-11 18:31:52,745 datadrainer L0193 DEBUG| [ OK ] Started Initial cloud-init…ob (metadata service crawler).
$ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48
$ make -j8
$ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X
JOB ID : cb143be36631515f74cb6de2b263dfe1bc0f9709
JOB LOG : /home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log
(1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '1-tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg', 'logdir': '/home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/test-res... (900.97 s)
RESULTS : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
JOB TIME : 907.16 s
$ grep cloud-ini /home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log
2022-03-11 18:35:15,106 datadrainer L0193 DEBUG| Starting Initial cloud-init job (pre-networking)...
2022-03-11 18:35:21,691 datadrainer L0193 DEBUG| [FAILED] Failed to start Initial cloud-init job (pre-networking).
...
Thomas
^ permalink raw reply [flat|nested] 4+ messages in thread