All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Thomas Huth" <thuth@redhat.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	quintela@redhat.com, qemu-devel@nongnu.org,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [PATCH] tests/qtest/migration-test: Disable migration/multifd/tcp/plain/cancel
Date: Mon, 6 Mar 2023 14:09:33 +0000	[thread overview]
Message-ID: <ZAX0HY+veH1ceH+G@work-vm> (raw)
In-Reply-To: <ZAXx5VerHrVQbSwU@redhat.com>

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Mar 06, 2023 at 01:44:38PM +0000, Dr. David Alan Gilbert wrote:
> > * Thomas Huth (thuth@redhat.com) wrote:
> > > On 03/03/2023 13.05, Peter Maydell wrote:
> > > > On Fri, 3 Mar 2023 at 11:29, Thomas Huth <thuth@redhat.com> wrote:
> > > > > 
> > > > > On 03/03/2023 12.18, Peter Maydell wrote:
> > > > > > On Fri, 3 Mar 2023 at 09:10, Juan Quintela <quintela@redhat.com> wrote:
> > > > > > > 
> > > > > > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > > > On Thu, Mar 02, 2023 at 05:22:11PM +0000, Peter Maydell wrote:
> > > > > > > > > migration-test has been flaky for a long time, both in CI and
> > > > > > > > > otherwise:
> > > > > > > > > 
> > > > > > > > > https://gitlab.com/qemu-project/qemu/-/jobs/3806090216
> > > > > > > > > (a FreeBSD job)
> > > > > > > > >     32/648 ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ERROR
> > > > > > > > > 
> > > > > > > > > on a local macos x86 box:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > What is really weird with this failure is that:
> > > > > > > - it only happens on non-x86
> > > > > > 
> > > > > > No, I have seen it on x86 macos, and x86 OpenBSD
> > > > > > 
> > > > > > > - on code that is not arch dependent
> > > > > > > - on cancel, what we really do there is close fd's for the multifd
> > > > > > >     channel threads to get out of the recv, i.e. again, nothing that
> > > > > > >     should be arch dependent.
> > > > > > 
> > > > > > I'm pretty sure that it tends to happen when the machine that's
> > > > > > running the test is heavily loaded. You probably have a race condition.
> > > > > 
> > > > > I think I can second that. IIRC I've seen it a couple of times on my x86
> > > > > laptop when running "make check -j$(nproc) SPEED=slow" here.
> > > > 
> > > > And another on-x86 failure case, just now, on the FreeBSD x86 CI job:
> > > > https://gitlab.com/qemu-project/qemu/-/jobs/3870165180
> > > 
> > > And FWIW, I just saw this while doing "make vm-build-netbsd J=4":
> > > 
> > > ▶  31/645 ERROR:../src/tests/qtest/migration-test.c:1841:test_migrate_auto_converge: 'got_stop' should be FALSE ERROR
> > 
> > That one is kind of interesting; this is an auto converge test - so it
> > tries to setup migration so it won't finish, to check that the auto
> > converge kicks in.  Except in this case the migration *did* finish
> > without the autoconverge (significantly) kicking in.
> > 
> > So I guess any of:
> >   a) The CPU thread never got much CPU time so not much dirtying
> > happened.
> >   b) The bandwidth calculations might be bad enough/course enough
> > that it's passing the (very low) bandwidth limit due to bad
> > approximation at bandwidth needed.
> >   c) The autoconverge jump happens fast enough for that loop
> > to hit the got_stop in the loop time of that loop.
> > 
> > I guess we could:
> >   i) Reduce the usleep in test_migrate_auto_converge
> >     (So it is more likely to correctly drop out of that loop
> >     as soon as autoconverge kicks in)
> 
> The CPU time spent by the dirtying guest CPUs should dominate
> here, so we can afford to reduce that timeout down a bit to
> be more responsive.
> 
> >   ii) Reduce inc_pct so that autoconverge kicks in slower
> >   iii) Reduce max-bandwidth in migrate_ensure_non_converge
> >      even further.
> 
> migrate_ensure_non_converge is trying to guarantee non-convergance,
> but obviously we're only achieving a probibalistic chance of
> non-converage. To get the probably closer to 100% we should make
> it massively smaller, say 100kbs instead of 30mbs.

Yeh, I'll cut a patch for this.

Dave

> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2023-03-06 14:09 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-02 17:22 [PATCH] tests/qtest/migration-test: Disable migration/multifd/tcp/plain/cancel Peter Maydell
2023-03-02 17:34 ` Daniel P. Berrangé
2023-03-03  9:10   ` Juan Quintela
2023-03-03  9:12     ` Daniel P. Berrangé
2023-03-03 11:18     ` Peter Maydell
2023-03-03 11:28       ` Thomas Huth
2023-03-03 11:43         ` Peter Maydell
2023-03-03 12:05         ` Peter Maydell
2023-03-06 13:08           ` Thomas Huth
2023-03-06 13:44             ` Dr. David Alan Gilbert
2023-03-06 14:00               ` Daniel P. Berrangé
2023-03-06 14:09                 ` Dr. David Alan Gilbert [this message]
2023-03-06 15:17                 ` Dr. David Alan Gilbert
2023-03-02 17:37 ` Dr. David Alan Gilbert
2023-03-02 22:25   ` Philippe Mathieu-Daudé
2023-03-03  7:43 ` Thomas Huth
2023-03-03  9:08 ` Juan Quintela
2023-03-04 15:39 ` Peter Maydell
2023-03-07  9:53   ` Peter Maydell
2023-03-12 14:06     ` Peter Maydell
2023-03-12 17:46       ` Peter Maydell
2023-03-14 10:11         ` Dr. David Alan Gilbert
2023-03-14 12:46           ` Peter Maydell
2023-03-14 13:05             ` Daniel P. Berrangé
2023-03-14 13:13             ` Dr. David Alan Gilbert
2023-03-14 16:46           ` Peter Xu
2023-03-14 17:48             ` Daniel P. Berrangé
2023-03-14 19:31             ` Peter Maydell
2023-03-14 20:51               ` Peter Xu
2023-03-22 20:15     ` Peter Maydell
2023-04-03 19:16       ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZAX0HY+veH1ceH+G@work-vm \
    --to=dgilbert@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=berrange@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.