qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Hyman Huang" <yong.huang@smartx.com>,
	qemu-devel@nongnu.org, "Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle
Date: Tue, 10 Sep 2024 17:22:45 -0400	[thread overview]
Message-ID: <ZuC4pYT-atQwWePv@x1n> (raw)
In-Reply-To: <87frq8lcgp.fsf@suse.de>

On Mon, Sep 09, 2024 at 06:54:46PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Sep 09, 2024 at 03:02:57PM +0100, Peter Maydell wrote:
> >> On Mon, 9 Sept 2024 at 14:51, Hyman Huang <yong.huang@smartx.com> wrote:
> >> >
> >> > Despite the fact that the responsive CPU throttle is enabled,
> >> > the dirty sync count may not always increase because this is
> >> > an optimization that might not happen in any situation.
> >> >
> >> > This test case just making sure it doesn't interfere with any
> >> > current functionality.
> >> >
> >> > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> >> 
> >> tests/qtest/migration-test already runs 75 different
> >> subtests, takes up a massive chunk of our "make check"
> >> time, and is very commonly a "times out" test on some
> >> of our CI jobs. It runs on five different guest CPU
> >> architectures, each one of which takes between 2 and
> >> 5 minutes to complete the full migration-test.
> >> 
> >> Do we really need to make it even bigger?
> >
> > I'll try to find some time in the next few weeks looking into this to see
> > whether we can further shrink migration test times after previous attemps
> > from Dan.  At least a low hanging fruit is we should indeed put some more
> > tests into g_test_slow(), and this new test could also be a candidate (then
> > we can run "-m slow" for migration PRs only).
> 
> I think we could (using -m slow or any other method) separate tests
> that are generic enough that every CI run should benefit from them
> vs. tests that are only useful once someone starts touching migration
> code. I'd say very few in the former category and most of them in the
> latter.
> 
> For an idea of where migration bugs lie, I took a look at what was
> fixed since 2022:
> 
> # bugs | device/subsystem/arch
> ----------------------------------
>     54 | migration
>     10 | vfio
>      6 | ppc
>      3 | virtio-gpu
>      2 | pcie_sriov, tpm_emulator,
>           vdpa, virtio-rng-pci
>      1 | arm, block, gpio, lasi,
>           pci, s390, scsi-disk,
>           virtio-mem, TCG

Just curious; how did you collect these?

> 
> From these, ignoring the migration bugs, the migration-tests cover some
> of: arm, ppc, s390, TCG. The device_opts[1] patch hasn't merged yet, but
> once it is, then virtio-gpu would be covered and we could investigate
> adding some of the others.
> 
> For actual migration code issues:
> 
> # bugs | (sub)subsystem | kind
> ----------------------------------------------
>     13 | multifd        | correctness/races
>      8 | ram            | correctness
>      8 | rdma:          | general programming

8 rdma bugs??? ouch..

>      7 | qmp            | new api bugs
>      5 | postcopy       | races
>      4 | file:          | leaks
>      3 | return path    | races
>      3 | fd_cleanup     | races
>      2 | savevm, aio/coroutines
>      1 | xbzrle, colo, dirtyrate, exec:,
>           windows, iochannel, qemufile,
>           arch (ppc64le)
> 
> Here, the migration-tests cover well: multifd, ram, qmp, postcopy,
> file, rp, fd_cleanup, iochannel, qemufile, xbzrle.
> 
> My suggestion is we run per arch:
> 
> "/precopy/tcp/plain"
> "/precopy/tcp/tls/psk/match",
> "/postcopy/plain"
> "/postcopy/preempt/plain"
> "/postcopy/preempt/recovery/plain"
> "/multifd/tcp/plain/cancel"
> "/multifd/tcp/uri/plain/none"

Don't you want to still keep a few multifd / file tests?

IIUC some file ops can still be relevant to archs.  Multifd still has one
bug that can only reproduce on arm64.. but not x86_64.  I remember it's a
race condition when migration finishes, and the issue could be memory
ordering relevant, but maybe not.

> 
> and x86 gets extra:
> 
> "/precopy/unix/suspend/live"
> "/precopy/unix/suspend/notlive"
> "/dirty_ring"

dirty ring will be disabled anyway when !x86, so probably not a major
concern.

> 
> (the other dirty_* tests are too slow)

These are the 10 slowest tests when I run locally:

/x86_64/migration/multifd/tcp/tls/x509/allow-anon-client 2.41
/x86_64/migration/postcopy/recovery/plain 2.43
/x86_64/migration/multifd/tcp/tls/x509/default-host 2.66
/x86_64/migration/multifd/tcp/tls/x509/override-host 2.86
/x86_64/migration/postcopy/tls/psk 2.91
/x86_64/migration/postcopy/preempt/recovery/tls/psk 3.08
/x86_64/migration/postcopy/preempt/tls/psk 3.30
/x86_64/migration/postcopy/recovery/tls/psk 3.81
/x86_64/migration/vcpu_dirty_limit 13.29
/x86_64/migration/precopy/unix/xbzrle 27.55

Are you aware of people using xbzrle at all?

> 
> All the rest go behind a knob that people touching migration code will
> enable.
> 
> wdyt?

Agree with the general idea, but I worry above exact list can be too small.

IMHO we can definitely, at least, move the last two into slow list
(vcpu_dirty_limit and xbzrle), then it'll already save us 40sec each run..

> 
> 1- allows adding devices to QEMU cmdline for migration-test
> https://lore.kernel.org/r/20240523201922.28007-4-farosas@suse.de
> 

-- 
Peter Xu



  reply	other threads:[~2024-09-10 21:23 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-09 13:47 [PATCH RFC 00/10] migration: auto-converge refinements for huge VM Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 01/10] migration: Introduce structs for periodic CPU throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 02/10] migration: Refine util functions to support " Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 03/10] qapi/migration: Introduce periodic CPU throttling parameters Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 04/10] qapi/migration: Introduce the iteration-count Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 05/10] migration: Introduce util functions for periodic CPU throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 06/10] migration: Support " Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 07/10] tests/migration-tests: Add test case for periodic throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 08/10] migration: Introduce cpu-responsive-throttle parameter Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 09/10] migration: Support responsive CPU throttle Hyman Huang
2024-09-09 13:47 ` [PATCH RFC 10/10] tests/migration-tests: Add test case for " Hyman Huang
2024-09-09 14:02   ` Peter Maydell
2024-09-09 14:36     ` Peter Xu
2024-09-09 21:54       ` Fabiano Rosas
2024-09-10 21:22         ` Peter Xu [this message]
2024-09-10 22:23           ` Fabiano Rosas
2024-09-11 15:59             ` Peter Xu
2024-09-11 19:48               ` Fabiano Rosas
2024-09-11 20:37                 ` Peter Xu
2024-09-11 21:26                   ` Fabiano Rosas
2024-09-12  8:13                     ` Peter Maydell
2024-09-12 13:48                       ` Fabiano Rosas
2024-09-12 14:09                         ` Peter Maydell
2024-09-12 14:28                           ` Fabiano Rosas
2024-09-12 15:09                       ` Peter Xu
2024-09-12 15:14                         ` Peter Maydell
2024-09-13 15:02                           ` Peter Xu
2024-09-12 15:37                         ` Fabiano Rosas
2024-09-12 22:52                           ` Fabiano Rosas
2024-09-13 15:00                             ` Peter Xu
2024-09-13 15:09                               ` Fabiano Rosas
2024-09-13 15:17                                 ` Fabiano Rosas
2024-09-13 15:38                                   ` Peter Xu
2024-09-13 17:51                                     ` Fabiano Rosas
2024-09-09 14:43     ` Yong Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZuC4pYT-atQwWePv@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=david@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=yong.huang@smartx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).