From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
"Hyman Huang" <yong.huang@smartx.com>,
qemu-devel@nongnu.org, "Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"David Hildenbrand" <david@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle
Date: Tue, 10 Sep 2024 17:22:45 -0400 [thread overview]
Message-ID: <ZuC4pYT-atQwWePv@x1n> (raw)
In-Reply-To: <87frq8lcgp.fsf@suse.de>
On Mon, Sep 09, 2024 at 06:54:46PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Mon, Sep 09, 2024 at 03:02:57PM +0100, Peter Maydell wrote:
> >> On Mon, 9 Sept 2024 at 14:51, Hyman Huang <yong.huang@smartx.com> wrote:
> >> >
> >> > Despite the fact that the responsive CPU throttle is enabled,
> >> > the dirty sync count may not always increase because this is
> >> > an optimization that might not happen in any situation.
> >> >
> >> > This test case just making sure it doesn't interfere with any
> >> > current functionality.
> >> >
> >> > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> >>
> >> tests/qtest/migration-test already runs 75 different
> >> subtests, takes up a massive chunk of our "make check"
> >> time, and is very commonly a "times out" test on some
> >> of our CI jobs. It runs on five different guest CPU
> >> architectures, each one of which takes between 2 and
> >> 5 minutes to complete the full migration-test.
> >>
> >> Do we really need to make it even bigger?
> >
> > I'll try to find some time in the next few weeks looking into this to see
> > whether we can further shrink migration test times after previous attemps
> > from Dan. At least a low hanging fruit is we should indeed put some more
> > tests into g_test_slow(), and this new test could also be a candidate (then
> > we can run "-m slow" for migration PRs only).
>
> I think we could (using -m slow or any other method) separate tests
> that are generic enough that every CI run should benefit from them
> vs. tests that are only useful once someone starts touching migration
> code. I'd say very few in the former category and most of them in the
> latter.
>
> For an idea of where migration bugs lie, I took a look at what was
> fixed since 2022:
>
> # bugs | device/subsystem/arch
> ----------------------------------
> 54 | migration
> 10 | vfio
> 6 | ppc
> 3 | virtio-gpu
> 2 | pcie_sriov, tpm_emulator,
> vdpa, virtio-rng-pci
> 1 | arm, block, gpio, lasi,
> pci, s390, scsi-disk,
> virtio-mem, TCG
Just curious; how did you collect these?
>
> From these, ignoring the migration bugs, the migration-tests cover some
> of: arm, ppc, s390, TCG. The device_opts[1] patch hasn't merged yet, but
> once it is, then virtio-gpu would be covered and we could investigate
> adding some of the others.
>
> For actual migration code issues:
>
> # bugs | (sub)subsystem | kind
> ----------------------------------------------
> 13 | multifd | correctness/races
> 8 | ram | correctness
> 8 | rdma: | general programming
8 rdma bugs??? ouch..
> 7 | qmp | new api bugs
> 5 | postcopy | races
> 4 | file: | leaks
> 3 | return path | races
> 3 | fd_cleanup | races
> 2 | savevm, aio/coroutines
> 1 | xbzrle, colo, dirtyrate, exec:,
> windows, iochannel, qemufile,
> arch (ppc64le)
>
> Here, the migration-tests cover well: multifd, ram, qmp, postcopy,
> file, rp, fd_cleanup, iochannel, qemufile, xbzrle.
>
> My suggestion is we run per arch:
>
> "/precopy/tcp/plain"
> "/precopy/tcp/tls/psk/match",
> "/postcopy/plain"
> "/postcopy/preempt/plain"
> "/postcopy/preempt/recovery/plain"
> "/multifd/tcp/plain/cancel"
> "/multifd/tcp/uri/plain/none"
Don't you want to still keep a few multifd / file tests?
IIUC some file ops can still be relevant to archs. Multifd still has one
bug that can only reproduce on arm64.. but not x86_64. I remember it's a
race condition when migration finishes, and the issue could be memory
ordering relevant, but maybe not.
>
> and x86 gets extra:
>
> "/precopy/unix/suspend/live"
> "/precopy/unix/suspend/notlive"
> "/dirty_ring"
dirty ring will be disabled anyway when !x86, so probably not a major
concern.
>
> (the other dirty_* tests are too slow)
These are the 10 slowest tests when I run locally:
/x86_64/migration/multifd/tcp/tls/x509/allow-anon-client 2.41
/x86_64/migration/postcopy/recovery/plain 2.43
/x86_64/migration/multifd/tcp/tls/x509/default-host 2.66
/x86_64/migration/multifd/tcp/tls/x509/override-host 2.86
/x86_64/migration/postcopy/tls/psk 2.91
/x86_64/migration/postcopy/preempt/recovery/tls/psk 3.08
/x86_64/migration/postcopy/preempt/tls/psk 3.30
/x86_64/migration/postcopy/recovery/tls/psk 3.81
/x86_64/migration/vcpu_dirty_limit 13.29
/x86_64/migration/precopy/unix/xbzrle 27.55
Are you aware of people using xbzrle at all?
>
> All the rest go behind a knob that people touching migration code will
> enable.
>
> wdyt?
Agree with the general idea, but I worry above exact list can be too small.
IMHO we can definitely, at least, move the last two into slow list
(vcpu_dirty_limit and xbzrle), then it'll already save us 40sec each run..
>
> 1- allows adding devices to QEMU cmdline for migration-test
> https://lore.kernel.org/r/20240523201922.28007-4-farosas@suse.de
>
--
Peter Xu
next prev parent reply other threads:[~2024-09-10 21:23 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-09 13:47 [PATCH RFC 00/10] migration: auto-converge refinements for huge VM Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 01/10] migration: Introduce structs for periodic CPU throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 02/10] migration: Refine util functions to support " Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 03/10] qapi/migration: Introduce periodic CPU throttling parameters Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 04/10] qapi/migration: Introduce the iteration-count Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 05/10] migration: Introduce util functions for periodic CPU throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 06/10] migration: Support " Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 07/10] tests/migration-tests: Add test case for periodic throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 08/10] migration: Introduce cpu-responsive-throttle parameter Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 09/10] migration: Support responsive CPU throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 10/10] tests/migration-tests: Add test case for " Hyman Huang
2024-09-09 14:02 ` Peter Maydell
2024-09-09 14:36 ` Peter Xu
2024-09-09 21:54 ` Fabiano Rosas
2024-09-10 21:22 ` Peter Xu [this message]
2024-09-10 22:23 ` Fabiano Rosas
2024-09-11 15:59 ` Peter Xu
2024-09-11 19:48 ` Fabiano Rosas
2024-09-11 20:37 ` Peter Xu
2024-09-11 21:26 ` Fabiano Rosas
2024-09-12 8:13 ` Peter Maydell
2024-09-12 13:48 ` Fabiano Rosas
2024-09-12 14:09 ` Peter Maydell
2024-09-12 14:28 ` Fabiano Rosas
2024-09-12 15:09 ` Peter Xu
2024-09-12 15:14 ` Peter Maydell
2024-09-13 15:02 ` Peter Xu
2024-09-12 15:37 ` Fabiano Rosas
2024-09-12 22:52 ` Fabiano Rosas
2024-09-13 15:00 ` Peter Xu
2024-09-13 15:09 ` Fabiano Rosas
2024-09-13 15:17 ` Fabiano Rosas
2024-09-13 15:38 ` Peter Xu
2024-09-13 17:51 ` Fabiano Rosas
2024-09-09 14:43 ` Yong Huang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZuC4pYT-atQwWePv@x1n \
--to=peterx@redhat.com \
--cc=armbru@redhat.com \
--cc=david@redhat.com \
--cc=eblake@redhat.com \
--cc=farosas@suse.de \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=yong.huang@smartx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).