qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Richard Henderson <richard.henderson@linaro.org>
Subject: Re: Migration tests are very slow in the CI
Date: Mon, 8 Aug 2022 18:00:44 +0100	[thread overview]
Message-ID: <YvFBPJ204rtMx+WC@work-vm> (raw)
In-Reply-To: <YvEIcNZ/CnFzdpkS@redhat.com>

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Aug 08, 2022 at 02:43:49PM +0200, Thomas Huth wrote:
> > On 08/08/2022 14.14, Daniel P. Berrangé wrote:
> > > On Mon, Aug 08, 2022 at 01:57:17PM +0200, Thomas Huth wrote:
> > > > 
> > > >   Hi!
> > > > 
> > > > Seems like we're getting more timeouts in the CI pipelines since commit
> > > > 2649a72555e ("Allow test to run without uffd") enabled the migration tests
> > > > in more scenarios.
> > > > 
> > > > For example:
> > > > 
> > > >   https://gitlab.com/qemu-project/qemu/-/jobs/2821578332#L49
> > > > 
> > > > You can see that the migration-test ran for more than 20 minutes for each
> > > > target (x86 and aarch64)! I think that's way too much by default.
> > > 
> > > Definitely too much.
> > > 
> > > > I had a check whether there is one subtest taking a lot of time, but it
> > > > rather seems like each of the migration test is taking 40 to 50 seconds in
> > > > the CI:
> > > > 
> > > >   https://gitlab.com/thuth/qemu/-/jobs/2825365836#L44
> > > 
> > > Normally with CI we expect a constant slowdown factor, eg x2.
> > > 
> > > I expect with migration though, we're triggering behaviour whereby
> > > the guest workload is generating dirty pages quicker than we can
> > > migrate them over localhost. The balance in this can quickly tip
> > > to create an exponential slowdown.
> > 
> > If I run the aarch64 migration-test on my otherwise idle x86 laptop, it also
> > takes already ca. 460 seconds to finish, which is IMHO also already too much
> > for a normal "make check" run (without SPEED=slow).
> > 
> > > I'm not sure if  'g_test_slow' gives us enough granularity though, as
> > > if we enable that, it'll impact the whole test suite, not just
> > > migration tests.
> > 
> > We could also check for the GITLAB_CI environment variable, just like we
> > already do it in some of the avocado-based tests ... but given the fact that
> > the migration test is already very slow on my normal x86 laptop, I think I'd
> > prefer if we added some checks with g_test_slow() in there ...
> > 
> > Are there any tests in migration-test.c that are rather redundant and could
> > be easily skipped in quick mode?
> 
> The trouble with migration is that there are alot of subtle permutations
> that interact in wierd ways, so we've got alot of test scenarios, includuing
> many with TLS:
> 
> /x86_64/migration/bad_dest
> /x86_64/migration/fd_proto
> /x86_64/migration/validate_uuid
> /x86_64/migration/validate_uuid_error
> /x86_64/migration/validate_uuid_src_not_set
> /x86_64/migration/validate_uuid_dst_not_set
> /x86_64/migration/auto_converge
> /x86_64/migration/dirty_ring
> /x86_64/migration/vcpu_dirty_limit
> /x86_64/migration/postcopy/unix
> /x86_64/migration/postcopy/plain
> /x86_64/migration/postcopy/recovery/plain
> /x86_64/migration/postcopy/recovery/tls/psk
> /x86_64/migration/postcopy/preempt/plain
> /x86_64/migration/postcopy/preempt/recovery/plain
> /x86_64/migration/postcopy/preempt/recovery/tls/psk
> /x86_64/migration/postcopy/preempt/tls/psk
> /x86_64/migration/postcopy/tls/psk
> /x86_64/migration/precopy/unix/plain
> /x86_64/migration/precopy/unix/xbzrle
> /x86_64/migration/precopy/unix/tls/psk
> /x86_64/migration/precopy/unix/tls/x509/default-host
> /x86_64/migration/precopy/unix/tls/x509/override-host
> /x86_64/migration/precopy/tcp/plain
> /x86_64/migration/precopy/tcp/tls/psk/match
> /x86_64/migration/precopy/tcp/tls/psk/mismatch
> /x86_64/migration/precopy/tcp/tls/x509/default-host
> /x86_64/migration/precopy/tcp/tls/x509/override-host
> /x86_64/migration/precopy/tcp/tls/x509/mismatch-host
> /x86_64/migration/precopy/tcp/tls/x509/friendly-client
> /x86_64/migration/precopy/tcp/tls/x509/hostile-client
> /x86_64/migration/precopy/tcp/tls/x509/allow-anon-client
> /x86_64/migration/precopy/tcp/tls/x509/reject-anon-client
> /x86_64/migration/multifd/tcp/plain/none
> /x86_64/migration/multifd/tcp/plain/cancel
> /x86_64/migration/multifd/tcp/plain/zlib
> /x86_64/migration/multifd/tcp/plain/zstd
> /x86_64/migration/multifd/tcp/tls/psk/match
> /x86_64/migration/multifd/tcp/tls/psk/mismatch
> /x86_64/migration/multifd/tcp/tls/x509/default-host
> /x86_64/migration/multifd/tcp/tls/x509/override-host
> /x86_64/migration/multifd/tcp/tls/x509/mismatch-host
> /x86_64/migration/multifd/tcp/tls/x509/allow-anon-client
> /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client
> 
> Each takes about 4 seconds, except for the xbzrle, autoconverge and
> vcpu-dirty-rate tests which take 8-12 seconds.
> 
> We could short-circuit most of the tls tests, because 90% of what
> they're validating is the initial connection setup phase. We don't
> really need to run the full migration to completion, we can just
> abort once we're running. Just keep 3 doing the full migration
> to completion - one precopy, one postcopy and one multifd.

I'd rather we combined some than cutting stuff off; I was about to
suggest doing zlib with some of the TLS but then that wouldn't have
found the recent zlib one!

Dave

> That'd cut most of thte TLS tests from 4 seconds to 0.5 seconds.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



      reply	other threads:[~2022-08-08 17:03 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08 11:57 Migration tests are very slow in the CI Thomas Huth
2022-08-08 12:14 ` Daniel P. Berrangé
2022-08-08 12:43   ` Thomas Huth
2022-08-08 12:58     ` Daniel P. Berrangé
2022-08-08 17:00       ` Dr. David Alan Gilbert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvFBPJ204rtMx+WC@work-vm \
    --to=dgilbert@redhat.com \
    --cc=berrange@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).