qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	Leonardo Bras <leobras@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
Date: Thu, 1 Jun 2023 12:22:36 -0400	[thread overview]
Message-ID: <ZHjFzH9zT34MIBEv@x1n> (raw)
In-Reply-To: <ZHjBw2E+eJKNsniO@redhat.com>

On Thu, Jun 01, 2023 at 05:05:23PM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 01, 2023 at 11:46:01AM -0400, Peter Xu wrote:
> > On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote:
> > > The migration test cases that actually exercise live migration want to
> > > ensure there is a minimum of two iterations of pre-copy, in order to
> > > exercise the dirty tracking code.
> > > 
> > > Historically we've queried the migration status, looking for the
> > > 'dirty-sync-count' value to increment to track iterations. This was
> > > not entirely reliable because often all the data would get transferred
> > > quickly enough that the migration would finish before we wanted it
> > > to. So we massively dropped the bandwidth and max downtime to
> > > guarantee non-convergance. This had the unfortunate side effect
> > > that every migration took at least 30 seconds to run (100 MB of
> > > dirty pages / 3 MB/sec).
> > > 
> > > This optimization takes a different approach to ensuring that a
> > > mimimum of two iterations. Rather than waiting for dirty-sync-count
> > > to increment, directly look for an indication that the source VM
> > > has dirtied RAM that has already been transferred.
> > > 
> > > On the source VM a magic marker is written just after the 3 MB
> > > offset. The destination VM is now montiored to detect when the
> > > magic marker is transferred. This gives a guarantee that the
> > > first 3 MB of memory have been transferred. Now the source VM
> > > memory is monitored at exactly the 3MB offset until we observe
> > > a flip in its value. This gives us a guaranteed that the guest
> > > workload has dirtied a byte that has already been transferred.
> > > 
> > > Since we're looking at a place that is only 3 MB from the start
> > > of memory, with the 3 MB/sec bandwidth, this test should complete
> > > in 1 second, instead of 30 seconds.
> > > 
> > > Once we've proved there is some dirty memory, migration can be
> > > set back to full speed for the remainder of the 1st iteration,
> > > and the entire of the second iteration at which point migration
> > > should be complete.
> > > 
> > > On a test machine this further reduces the migration test time
> > > from 8 minutes to 1 minute 40.
> > 
> > The outcome is definitely nice, but it does looks slightly hacky to me and
> > make the test slightly more complicated.
> > 
> > If it's all about making sure we finish the 1st iteration, can we simply
> > add a src qemu parameter "switchover-hold"?  If it's set, src never
> > switchover to dst but keeps the iterations.
> 
> For *most* of the tests, we want to ensure there are a minimum
> of 2 iterations. For the XBZRLE test we want to ensure there are
> a minimum of 3 iterations, so the cache gets  workout.
> 
> > Then migrate_ensure_non_converge() will be as simple as setting
> > switchover-hold to true.
> > 
> > I am even thinking whether there can even be real-life use case for that,
> > e.g., where a user might want to have a pre-heat of a migration of some VM,
> > and trigger it immediately when the admin really wants (the pre-heats moved
> > most of the pages and keep doing so).
> > 
> > It'll be also similar to what Avihai proposed here on switchover-ack, just
> > an ack mechanism on the src side:
> > 
> > https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com
> 
> In general I strongly wanted to avoid adding special logic to the
> migration code that makes it directly synchronize with the  test
> suite, because once we do that I don't think the test suite is a
> providing coverage of the real world usage scenario.

The problem is non-live is already not real world usage in most cases.  It
seems we all agreed that it's the code paths to cover not real world usages
in the tests, or maybe not?

> 
> IOW, if we add a switchover-ack feature, we should certainly have
> *a* test that uses it, but we should not modify other tests because
> we want to exercise the logic as it would run with apps that don't
> rely on switchover-ack.
> 
> Also, this slow migration test is incredibly painful for people right
> now, so I'd like to see us get a speed up committed to git quickly.
> I don't want to block it on a feature proposal that might yet take
> months to merge.

That'll be trivial, afaict.

I just worry that this patch will bring complexity to the test cases,
that's another burden we need to carry besides QEMU itself.

If you want, I can try to prepare such a patch before this weekend, and if
it's complicated enough and take more than next week to review feel free to
go ahead with this one.

I understand the migration test issue was there for a long time.  But still
to me it's important on which may be cleaner for the long term too.

-- 
Peter Xu



  reply	other threads:[~2023-06-01 16:23 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
2023-05-31 13:23 ` [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants Daniel P. Berrangé
2023-06-01  9:23   ` Thomas Huth
2023-06-01 12:48     ` Daniel P. Berrangé
2023-06-01 12:04   ` Juan Quintela
2023-06-01 12:20   ` Juan Quintela
2023-06-01 12:51     ` Daniel P. Berrangé
2023-05-31 13:23 ` [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events Daniel P. Berrangé
2023-05-31 14:57   ` Thomas Huth
2023-06-01 12:14   ` Juan Quintela
2023-06-01 12:56     ` Daniel P. Berrangé
2023-05-31 13:23 ` [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test Daniel P. Berrangé
2023-06-01  9:26   ` Thomas Huth
2023-06-01  9:32     ` Daniel P. Berrangé
2023-06-01 12:17   ` Juan Quintela
2023-05-31 13:23 ` [PATCH v3 4/9] tests/qtest: get rid of some 'qtest_qmp' usage " Daniel P. Berrangé
2023-06-01  9:28   ` Thomas Huth
2023-06-01 12:10   ` Juan Quintela
2023-05-31 13:23 ` [PATCH v3 5/9] tests/qtest: switch to using event callbacks for STOP event Daniel P. Berrangé
2023-06-01  9:31   ` Thomas Huth
2023-06-01 12:23   ` Juan Quintela
2023-05-31 13:23 ` [PATCH v3 6/9] tests/qtest: replace wait_command() with qtest_qmp_assert_success Daniel P. Berrangé
2023-06-01  9:37   ` Thomas Huth
2023-06-01 12:27   ` Juan Quintela
2023-05-31 13:23 ` [PATCH v3 7/9] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
2023-06-01  9:38   ` Thomas Huth
2023-06-01 12:31   ` Juan Quintela
2023-06-01 12:34     ` Daniel P. Berrangé
2023-06-01 12:37       ` Juan Quintela
2023-06-01 12:44         ` Daniel P. Berrangé
2023-05-31 13:23 ` [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
2023-06-01  9:47   ` Thomas Huth
2023-06-01 12:33   ` Juan Quintela
2023-06-01 12:38     ` Daniel P. Berrangé
2023-06-01 16:09     ` Thomas Huth
2023-06-01 16:17       ` Daniel P. Berrangé
2023-06-01 16:26         ` Peter Xu
2023-06-01 15:30   ` Peter Xu
2023-06-01 15:39     ` Daniel P. Berrangé
2023-06-01 15:53       ` Peter Xu
2023-06-01 15:55         ` Daniel P. Berrangé
2023-06-01 16:17           ` Peter Xu
2023-06-01 16:35             ` Daniel P. Berrangé
2023-06-01 16:59               ` Peter Xu
2023-06-01 22:58             ` Juan Quintela
2023-06-01 22:55           ` Juan Quintela
2023-05-31 13:24 ` [PATCH v3 9/9] tests/qtest: massively speed up migration-test Daniel P. Berrangé
2023-06-01 10:04   ` Thomas Huth
2023-06-01 15:46   ` Peter Xu
2023-06-01 16:05     ` Daniel P. Berrangé
2023-06-01 16:22       ` Peter Xu [this message]
2023-06-01 16:36         ` Daniel P. Berrangé
2023-06-01 17:04           ` Peter Xu
2023-06-01 23:00     ` Juan Quintela
2023-06-01 23:43       ` Peter Xu
2023-07-10  9:35         ` Daniel P. Berrangé
2023-07-10  9:40           ` Thomas Huth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZHjFzH9zT34MIBEv@x1n \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=leobras@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).