qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: "Fabiano Rosas" <farosas@suse.de>,
	"Hyman Huang" <yong.huang@smartx.com>,
	qemu-devel@nongnu.org, "Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle
Date: Thu, 12 Sep 2024 11:09:11 -0400	[thread overview]
Message-ID: <ZuMEF99PF0q0U9G-@x1n> (raw)
In-Reply-To: <CAFEAcA9YkZiSSOAj0zH2OwF9AcziJT-zpnNVQn8BXizhSXHVOA@mail.gmail.com>

On Thu, Sep 12, 2024 at 09:13:16AM +0100, Peter Maydell wrote:
> On Wed, 11 Sept 2024 at 22:26, Fabiano Rosas <farosas@suse.de> wrote:
> > I don't think we're discussing total CI time at this point, so the math
> > doesn't really add up. We're not looking into making the CI finish
> > faster. We're looking into making migration-test finish faster. That
> > would reduce timeouts in CI, speed-up make check and reduce the chance
> > of random race conditions* affecting other people/staging runs.
> 
> Right. The reason migration-test appears on my radar is because
> it is very frequently the thing that shows up as "this sometimes
> just fails or just times out and if you hit retry it goes away
> again". That might not be migration-test's fault specifically,
> because those retries tend to be certain CI configs (s390,
> the i686-tci one), and I have some theories about what might be
> causing it (e.g. build system runs 4 migration-tests in parallel,
> which means 8 QEMU processes which is too many for the number
> of host CPUs). But right now I look at CI job failures and my reaction
> is "oh, it's the migration-test failing yet again" :-(
> 
> For some examples from this week:
> 
> https://gitlab.com/qemu-project/qemu/-/jobs/7802183144
> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373  <--------[1]
> https://gitlab.com/qemu-project/qemu/-/jobs/7786579152  <--------[2]
> https://gitlab.com/qemu-project/qemu/-/jobs/7786579155

Ah right, the TIMEOUT is unfortunate, especially if tests can be run in
parallel.  It indeed sounds like no good way to finally solve.. I don't
also see how speeding up / reducing tests in migration test would help, as
that's (from some degree..) is the same as tuning the timeout value bigger.
When the tests are less it'll fit into 480s window, but maybe it's too
quick now we wonder whether we should shrink it to e.g. 90s, but then it
can timeout again when on a busy host with less capability of concurrency.

But indeed there're two ERRORs ([1,2] above)..  I collected some more info
here before the log expires:

=================================8<================================

*** /i386/migration/multifd/tcp/plain/cancel, qtest-i386 on s390 host

https://gitlab.com/qemu-project/qemu/-/jobs/7799842373

101/953 qemu:qtest+qtest-i386 / qtest-i386/migration-test                         ERROR          144.32s   killed by signal 6 SIGABRT
>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh PYTHON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/pyvenv/bin/python3 QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=144 QTEST_QEMU_BINARY=./qemu-system-i386 /home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
warning: fd: migration to a file is deprecated. Use file: instead.
warning: fd: migration to a file is deprecated. Use file: instead.
../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
(test program exited with status code -6)
TAP parsing error: Too few tests run (expected 53, got 39)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

# Start of plain tests
# Running /i386/migration/multifd/tcp/plain/cancel
# Using machine type: pc-i440fx-9.2
# starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-3273509.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3273509.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine pc-i440fx-9.2, -name source,debug-threads=on -m 150M -serial file:/tmp/migration-test-4112T2/src_serial -drive if=none,id=d0,file=/tmp/migration-test-4112T2/bootsect,format=raw -device ide-hd,drive=d0,secs=1,cyls=1,heads=1    2>/dev/null -accel qtest
# starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-3273509.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3273509.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine pc-i440fx-9.2, -name target,debug-threads=on -m 150M -serial file:/tmp/migration-test-4112T2/dest_serial -incoming defer -drive if=none,id=d0,file=/tmp/migration-test-4112T2/bootsect,format=raw -device ide-hd,drive=d0,secs=1,cyls=1,heads=1    2>/dev/null -accel qtest
----------------------------------- stderr -----------------------------------
warning: fd: migration to a file is deprecated. Use file: instead.
warning: fd: migration to a file is deprecated. Use file: instead.
../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)

*** /ppc64/migration/multifd/tcp/plain/cancel, qtest-ppc64 on i686 host

https://gitlab.com/qemu-project/qemu/-/jobs/7786579152

174/315 qemu:qtest+qtest-ppc64 / qtest-ppc64/migration-test                       ERROR          381.00s   killed by signal 6 SIGABRT
>>> PYTHON=/builds/qemu-project/qemu/build/pyvenv/bin/python3 QTEST_QEMU_IMG=./qemu-img G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-ppc64 MALLOC_PERTURB_=178 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-ppc64: Cannot read from TLS channel: The TLS connection was non-properly terminated.
warning: fd: migration to a file is deprecated. Use file: instead.
warning: fd: migration to a file is deprecated. Use file: instead.
../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
(test program exited with status code -6)
TAP parsing error: Too few tests run (expected 61, got 47)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

# Start of plain tests
# Running /ppc64/migration/multifd/tcp/plain/cancel
# Using machine type: pseries-9.2
# starting QEMU: exec ./qemu-system-ppc64 -qtest unix:/tmp/qtest-40766.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-40766.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine pseries-9.2,vsmt=8 -name source,debug-threads=on -m 256M -serial file:/tmp/migration-test-H0Z1T2/src_serial -nodefaults -machine cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off, -bios /tmp/migration-test-H0Z1T2/bootsect    2>/dev/null -accel qtest
# starting QEMU: exec ./qemu-system-ppc64 -qtest unix:/tmp/qtest-40766.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-40766.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine pseries-9.2,vsmt=8 -name target,debug-threads=on -m 256M -serial file:/tmp/migration-test-H0Z1T2/dest_serial -incoming defer -nodefaults -machine cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off, -bios /tmp/migration-test-H0Z1T2/bootsect    2>/dev/null -accel qtest
----------------------------------- stderr -----------------------------------
qemu-system-ppc64: Cannot read from TLS channel: The TLS connection was non-properly terminated.
warning: fd: migration to a file is deprecated. Use file: instead.
warning: fd: migration to a file is deprecated. Use file: instead.
../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)

(test program exited with status code -6)
=================================8<================================

So.. it's the same test (multifd/tcp/plain/cancel) that is failing on
different host / arch being tested.  What is more weird is the two failures
are different, the 2nd failure throw out a TLS error even though the test
doesn't yet have tls involved.

Fabiano, is this the issue you're looking at?

Peter, do you think it'll be helpful if we temporarily mark this test as
"slow" too so it's not run in CI (so we still run it ourselves when prepare
migration PR, with the hope that it can reproduce)?

Thanks,

-- 
Peter Xu



  parent reply	other threads:[~2024-09-12 15:10 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-09 13:47 [PATCH RFC 00/10] migration: auto-converge refinements for huge VM Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 01/10] migration: Introduce structs for periodic CPU throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 02/10] migration: Refine util functions to support " Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 03/10] qapi/migration: Introduce periodic CPU throttling parameters Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 04/10] qapi/migration: Introduce the iteration-count Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 05/10] migration: Introduce util functions for periodic CPU throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 06/10] migration: Support " Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 07/10] tests/migration-tests: Add test case for periodic throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 08/10] migration: Introduce cpu-responsive-throttle parameter Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 09/10] migration: Support responsive CPU throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 10/10] tests/migration-tests: Add test case for " Hyman Huang
2024-09-09 14:02   ` Peter Maydell
2024-09-09 14:36     ` Peter Xu
2024-09-09 21:54       ` Fabiano Rosas
2024-09-10 21:22         ` Peter Xu
2024-09-10 22:23           ` Fabiano Rosas
2024-09-11 15:59             ` Peter Xu
2024-09-11 19:48               ` Fabiano Rosas
2024-09-11 20:37                 ` Peter Xu
2024-09-11 21:26                   ` Fabiano Rosas
2024-09-12  8:13                     ` Peter Maydell
2024-09-12 13:48                       ` Fabiano Rosas
2024-09-12 14:09                         ` Peter Maydell
2024-09-12 14:28                           ` Fabiano Rosas
2024-09-12 15:09                       ` Peter Xu [this message]
2024-09-12 15:14                         ` Peter Maydell
2024-09-13 15:02                           ` Peter Xu
2024-09-12 15:37                         ` Fabiano Rosas
2024-09-12 22:52                           ` Fabiano Rosas
2024-09-13 15:00                             ` Peter Xu
2024-09-13 15:09                               ` Fabiano Rosas
2024-09-13 15:17                                 ` Fabiano Rosas
2024-09-13 15:38                                   ` Peter Xu
2024-09-13 17:51                                     ` Fabiano Rosas
2024-09-09 14:43     ` Yong Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZuMEF99PF0q0U9G-@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=david@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=yong.huang@smartx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).