qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* 'make check-acceptance' failing on s390 tests?
@ 2022-02-18 15:04 Peter Maydell
  2022-02-18 23:17 ` Richard Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Peter Maydell @ 2022-02-18 15:04 UTC (permalink / raw)
  To: QEMU Developers
  Cc: Thomas Huth, Daniel P. Berrange, Beraldo Leal, Cornelia Huck,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	qemu-s390x, Cleber Rosa, Alex Bennée

Hi; is anybody else seeing 'make check-acceptance' fail on some of
the s390 tests?

 (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
(900.20 s)


 (090/183) tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora:
FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s)


I've cc'd Daniel because the 090 at least looks like a resolution
baked into the test case, and commit de72c4b7c that went in
last month changed the EDID reported resolution from 1024x768
to 1280x800.

Not sure about the timeout on the boot test: the avocado log
shows it booting at least as far as
"Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)"
and then there's no further output until the timeout.
Unfortunately the avocado log doesn't seem to include useful
information like "this is the string we were waiting to see", so
I'm not sure exactly what's gone wrong there.

(I continue to find the Avocado tests rather opaque: when you
get a series of green OK's that's fine, but when you get a failure
it's often non-obvious why it failed or how to do simple things
like "rerun just that one failed test" or "run the failing command,
interactively on the command line".)

The 090 failure didn't cause the merge to be rejected because
in commit 333168efe5c8 we disabled both these tests when
running on GitLab.

Suggestion: we should either disable tests entirely (except
for manual "I want to run this known-flaky test") or not at
all, rather than disabling them only on GitLab. If I'm running
'make check-acceptance' locally I don't want to be distracted
by tests we know to be dodgy, any more than if I were running
the CI on GitLab.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 'make check-acceptance' failing on s390 tests?
  2022-02-18 15:04 'make check-acceptance' failing on s390 tests? Peter Maydell
@ 2022-02-18 23:17 ` Richard Henderson
  2022-02-21 15:27 ` Thomas Huth
  2022-03-11 17:52 ` Thomas Huth
  2 siblings, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2022-02-18 23:17 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers
  Cc: Thomas Huth, Daniel P. Berrange, Beraldo Leal, Cornelia Huck,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	qemu-s390x, Cleber Rosa, Alex Bennée

On 2/19/22 02:04, Peter Maydell wrote:
> Hi; is anybody else seeing 'make check-acceptance' fail on some of
> the s390 tests?
> 
>   (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
> 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
> (900.20 s)
> 
> 
>   (090/183) tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora:
> FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s)

FWIW, yes, I'm seeing those.


r~


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 'make check-acceptance' failing on s390 tests?
  2022-02-18 15:04 'make check-acceptance' failing on s390 tests? Peter Maydell
  2022-02-18 23:17 ` Richard Henderson
@ 2022-02-21 15:27 ` Thomas Huth
  2022-03-11 17:52 ` Thomas Huth
  2 siblings, 0 replies; 4+ messages in thread
From: Thomas Huth @ 2022-02-21 15:27 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers
  Cc: Daniel P. Berrange, Beraldo Leal, Cornelia Huck,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	qemu-s390x, Cleber Rosa, Alex Bennée

On 18/02/2022 16.04, Peter Maydell wrote:
> Hi; is anybody else seeing 'make check-acceptance' fail on some of
> the s390 tests?
> 
>   (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
> 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
> (900.20 s)
> 
> 
>   (090/183) tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora:
> FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s)
> 
> 
> I've cc'd Daniel because the 090 at least looks like a resolution
> baked into the test case, and commit de72c4b7c that went in
> last month changed the EDID reported resolution from 1024x768
> to 1280x800.

Yes, that seems to be right - since the default monitor resolution changed, 
the screenshot now has a different size, too. I sent a patch here:

https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg04473.html

> Not sure about the timeout on the boot test: the avocado log
> shows it booting at least as far as
> "Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)"
> and then there's no further output until the timeout.
> Unfortunately the avocado log doesn't seem to include useful
> information like "this is the string we were waiting to see", so
> I'm not sure exactly what's gone wrong there.
> 
> (I continue to find the Avocado tests rather opaque: when you
> get a series of green OK's that's fine, but when you get a failure
> it's often non-obvious why it failed or how to do simple things
> like "rerun just that one failed test" or "run the failing command,
> interactively on the command line".)

For me, it's even worse with the tests/avocado/boot_linux.py - none of them 
is working on my local laptop, so I was always ignoring them until now. 
FWIW, I'm seeing this python backtrace in the log:

  Reproduced traceback from: 
/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/core/test.py:770
  Traceback (most recent call last):
    File "/home/thuth/tmp/qemu-build/tests/avocado/boot_linux.py", line 30, 
in test_pc_i440fx_tcg
      self.launch_and_wait(set_up_ssh_connection=False)
    File 
"/home/thuth/tmp/qemu-build/tests/avocado/avocado_qemu/__init__.py", line 
636, in launch_and_wait
      cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port), 
self.name)
    File 
"/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/utils/cloudinit.py", 
line 192, in wait_for_phone_home
      s = PhoneHomeServer(address, instance_id)
    File 
"/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/utils/cloudinit.py", 
line 173, in __init__
      HTTPServer.__init__(self, address, PhoneHomeServerHandler)
    File "/usr/lib64/python3.6/socketserver.py", line 456, in __init__
      self.server_bind()
    File "/usr/lib64/python3.6/http/server.py", line 136, in server_bind
      socketserver.TCPServer.server_bind(self)
    File "/usr/lib64/python3.6/socketserver.py", line 470, in server_bind
      self.socket.bind(self.server_address)
  TypeError: an integer is required (got type NoneType)

... no clue how to debug these problems, though.

> The 090 failure didn't cause the merge to be rejected because
> in commit 333168efe5c8 we disabled both these tests when
> running on GitLab.
> 
> Suggestion: we should either disable tests entirely (except
> for manual "I want to run this known-flaky test") or not at
> all, rather than disabling them only on GitLab. If I'm running
> 'make check-acceptance' locally I don't want to be distracted
> by tests we know to be dodgy, any more than if I were running
> the CI on GitLab.

IIRC I only saw the occasional hangs of the test on Gitlab, and never on my 
local host ... but I see your point ... I'm fine if we replace the 
@skipIf(os.getenv('GITLAB_CI')...) there with a 
@skipUnless(os.getenv('AVOCADO_ALLOW_FLAKY_TESTS')...) or something similar. 
Would you have some spare time to write such a patch?

  Thomas



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 'make check-acceptance' failing on s390 tests?
  2022-02-18 15:04 'make check-acceptance' failing on s390 tests? Peter Maydell
  2022-02-18 23:17 ` Richard Henderson
  2022-02-21 15:27 ` Thomas Huth
@ 2022-03-11 17:52 ` Thomas Huth
  2 siblings, 0 replies; 4+ messages in thread
From: Thomas Huth @ 2022-03-11 17:52 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers, Richard Henderson
  Cc: Daniel P. Berrange, Beraldo Leal, Cornelia Huck,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	qemu-s390x, Cleber Rosa, Alex Bennée

On 18/02/2022 16.04, Peter Maydell wrote:
> Hi; is anybody else seeing 'make check-acceptance' fail on some of
> the s390 tests?
> 
>   (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
> 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
> (900.20 s)
[...]
> Not sure about the timeout on the boot test: the avocado log
> shows it booting at least as far as
> "Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)"
> and then there's no further output until the timeout.

Now that I've finally been able to run the test again (after
manually tweaking that borked is_port_free() function in
avocado), I've had a closer look at the failing BootLinuxS390X
test: If you're looking at the output of the guest in the log,
you can see that it fails to init the cloud-init stuff and
thus fails to "phone home" at the end.

This used to work fine in older versions, so I just spent a
lot of time bisecting this issue and ended up here:

f83bcecb1ffe25a18367409eaf4ba1453c835c48 is the first bad commit
commit f83bcecb1ffe25a18367409eaf4ba1453c835c48
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Tue Jul 27 07:48:55 2021 -1000

     accel/tcg: Add cpu_{ld,st}*_mmu interfaces

Richard, could you please have a look at this one, too? ... it
causes the test to fail:

$ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48~1
$ ./configure --target-list=s390x-softmmu --disable-docs
$ make -j8
$ make check-venv
$ cd build
$ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X
JOB ID     : 0a6d287620d150d52c24417d0a672a1a826b3a82
JOB LOG    : /home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log
  (1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: PASS (130.38 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 136.51 s
$ grep cloud-ini /home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log
...
2022-03-11 18:31:52,745 datadrainer      L0193 DEBUG| [  OK  ] Started Initial cloud-init…ob (metadata service crawler).

$ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48
$ make -j8
$ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X
JOB ID     : cb143be36631515f74cb6de2b263dfe1bc0f9709
JOB LOG    : /home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log
  (1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '1-tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg', 'logdir': '/home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/test-res... (900.97 s)
RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
JOB TIME   : 907.16 s
$ grep cloud-ini /home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log
2022-03-11 18:35:15,106 datadrainer      L0193 DEBUG|          Starting Initial cloud-init job (pre-networking)...
2022-03-11 18:35:21,691 datadrainer      L0193 DEBUG| [FAILED] Failed to start Initial cloud-init job (pre-networking).
...

  Thomas



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-11 17:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-18 15:04 'make check-acceptance' failing on s390 tests? Peter Maydell
2022-02-18 23:17 ` Richard Henderson
2022-02-21 15:27 ` Thomas Huth
2022-03-11 17:52 ` Thomas Huth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).