qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	"Daniel P . Berrange" <berrange@redhat.com>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH v4 00/19] migration: Postcopy Preemption
Date: Thu, 21 Apr 2022 14:57:37 +0100	[thread overview]
Message-ID: <YmFi0S36y+79HnzR@work-vm> (raw)
In-Reply-To: <20220331150857.74406-1-peterx@redhat.com>

* Peter Xu (peterx@redhat.com) wrote:
> This is v4 of postcopy preempt series.  It can also be found here:
> 
>   https://github.com/xzpeter/qemu/tree/postcopy-preempt
> 
> RFC: https://lore.kernel.org/qemu-devel/20220119080929.39485-1-peterx@redhat.com
> V1:  https://lore.kernel.org/qemu-devel/20220216062809.57179-1-peterx@redhat.com
> V2:  https://lore.kernel.org/qemu-devel/20220301083925.33483-1-peterx@redhat.com
> V3:  https://lore.kernel.org/qemu-devel/20220330213908.26608-1-peterx@redhat.com

I've queued:
migration: Allow migrate-recover to run multiple times
migration: Move channel setup out of postcopy_try_recover()
migration: Export ram_load_postcopy()
migration: Move migrate_allow_multifd and helpers into migration.c
migration: Add pss.postcopy_requested status
migration: Drop multifd tls_hostname cache
migration: Postpone releasing MigrationState.hostname

> v4:
> - Fix a double-free on params.tls-creds when quitting qemu
> - Reorder patches to satisfy per-commit builds
> 
> v3:
> - Rebased to master since many patches landed
> - Fixed one bug on postcopy recovery when preempt enabled, this is only
>   found when I test with TLS+recovery, because TLS changed the timing.
> - Dropped patch:
>   "migration: Fail postcopy preempt with TLS for now"
> - Added patches for TLS:
>   - "migration: Postpone releasing MigrationState.hostname"
>   - "migration: Drop multifd tls_hostname cache"
>   - "migration: Enable TLS for preempt channel"
>   - "migration: Export tls-[creds|hostname|authz] params to cmdline too"
>   - "tests: Add postcopy tls migration test"
>   - "tests: Add postcopy tls recovery migration test"
> - Added two more tests to the preempt test patch (tls, tls+recovery)
> 
> Abstract
> ========
> 
> This series added a new migration capability called "postcopy-preempt".  It can
> be enabled when postcopy is enabled, and it'll simply (but greatly) speed up
> postcopy page requests handling process.
> 
> Below are some initial postcopy page request latency measurements after the
> new series applied.
> 
> For each page size, I measured page request latency for three cases:
> 
>   (a) Vanilla:                the old postcopy
>   (b) Preempt no-break-huge:  preempt enabled, x-postcopy-preempt-break-huge=off
>   (c) Preempt full:           preempt enabled, x-postcopy-preempt-break-huge=on
>                               (this is the default option when preempt enabled)
> 
> Here x-postcopy-preempt-break-huge parameter is just added in v2 so as to
> conditionally disable the behavior to break sending a precopy huge page for
> debugging purpose.  So when it's off, postcopy will not preempt precopy
> sending a huge page, but still postcopy will use its own channel.
> 
> I tested it separately to give a rough idea on which part of the change
> helped how much of it.  The overall benefit should be the comparison
> between case (a) and (c).
> 
>   |-----------+---------+-----------------------+--------------|
>   | Page size | Vanilla | Preempt no-break-huge | Preempt full |
>   |-----------+---------+-----------------------+--------------|
>   | 4K        |   10.68 |               N/A [*] |         0.57 |
>   | 2M        |   10.58 |                  5.49 |         5.02 |
>   | 1G        | 2046.65 |               933.185 |      649.445 |
>   |-----------+---------+-----------------------+--------------|
>   [*]: This case is N/A because 4K page does not contain huge page at all
> 
> [1] https://github.com/xzpeter/small-stuffs/blob/master/tools/huge_vm/uffd-latency.bpf
> 
> TODO List
> =========
> 
> Avoid precopy write() blocks postcopy
> -------------------------------------
> 
> I didn't prove this, but I always think the write() syscalls being blocked
> for precopy pages can affect postcopy services.  If we can solve this
> problem then my wild guess is we can further reduce the average page
> latency.
> 
> Two solutions at least in mind: (1) we could have made the write side of
> the migration channel NON_BLOCK too, or (2) multi-threads on send side,
> just like multifd, but we may use lock to protect which page to send too
> (e.g., the core idea is we should _never_ rely anything on the main thread,
> multifd has that dependency on queuing pages only on main thread).
> 
> That can definitely be done and thought about later.
> 
> Multi-channel for preemption threads
> ------------------------------------
> 
> Currently the postcopy preempt feature use only one extra channel and one
> extra thread on dest (no new thread on src QEMU).  It should be mostly good
> enough for major use cases, but when the postcopy queue is long enough
> (e.g. hundreds of vCPUs faulted on different pages) logically we could
> still observe more delays in average.  Whether growing threads/channels can
> solve it is debatable, but sounds worthwhile a try.  That's yet another
> thing we can think about after this patchset lands.
> 
> Logically the design provides space for that - the receiving postcopy
> preempt thread can understand all ram-layer migration protocol, and for
> multi channel and multi threads we could simply grow that into multile
> threads handling the same protocol (with multiple PostcopyTmpPage).  The
> source needs more thoughts on synchronizations, though, but it shouldn't
> affect the whole protocol layer, so should be easy to keep compatible.
> 
> Please review, thanks.
> 
> Peter Xu (19):
>   migration: Postpone releasing MigrationState.hostname
>   migration: Drop multifd tls_hostname cache
>   migration: Add pss.postcopy_requested status
>   migration: Move migrate_allow_multifd and helpers into migration.c
>   migration: Export ram_load_postcopy()
>   migration: Move channel setup out of postcopy_try_recover()
>   migration: Allow migrate-recover to run multiple times
>   migration: Add postcopy-preempt capability
>   migration: Postcopy preemption preparation on channel creation
>   migration: Postcopy preemption enablement
>   migration: Postcopy recover with preempt enabled
>   migration: Create the postcopy preempt channel asynchronously
>   migration: Parameter x-postcopy-preempt-break-huge
>   migration: Add helpers to detect TLS capability
>   migration: Export tls-[creds|hostname|authz] params to cmdline too
>   migration: Enable TLS for preempt channel
>   tests: Add postcopy tls migration test
>   tests: Add postcopy tls recovery migration test
>   tests: Add postcopy preempt tests
> 
>  migration/channel.c          |  11 +-
>  migration/migration.c        | 218 ++++++++++++++++++++------
>  migration/migration.h        |  52 ++++++-
>  migration/multifd.c          |  36 +----
>  migration/multifd.h          |   4 -
>  migration/postcopy-ram.c     | 190 ++++++++++++++++++++++-
>  migration/postcopy-ram.h     |  11 ++
>  migration/qemu-file.c        |  27 ++++
>  migration/qemu-file.h        |   1 +
>  migration/ram.c              | 288 +++++++++++++++++++++++++++++++++--
>  migration/ram.h              |   3 +
>  migration/savevm.c           |  49 ++++--
>  migration/socket.c           |  22 ++-
>  migration/socket.h           |   1 +
>  migration/trace-events       |  15 +-
>  qapi/migration.json          |   8 +-
>  tests/qtest/migration-test.c | 113 ++++++++++++--
>  17 files changed, 918 insertions(+), 131 deletions(-)
> 
> -- 
> 2.32.0
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



      parent reply	other threads:[~2022-04-21 14:19 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-31 15:08 [PATCH v4 00/19] migration: Postcopy Preemption Peter Xu
2022-03-31 15:08 ` [PATCH v4 01/19] migration: Postpone releasing MigrationState.hostname Peter Xu
2022-04-07 17:21   ` Dr. David Alan Gilbert
2022-04-20 10:34   ` Daniel P. Berrangé
2022-04-20 18:19     ` Peter Xu
2022-03-31 15:08 ` [PATCH v4 02/19] migration: Drop multifd tls_hostname cache Peter Xu
2022-04-07 17:42   ` Dr. David Alan Gilbert
2022-04-20 10:35   ` Daniel P. Berrangé
2022-03-31 15:08 ` [PATCH v4 03/19] migration: Add pss.postcopy_requested status Peter Xu
2022-04-20 10:36   ` Daniel P. Berrangé
2022-03-31 15:08 ` [PATCH v4 04/19] migration: Move migrate_allow_multifd and helpers into migration.c Peter Xu
2022-04-20 10:41   ` Daniel P. Berrangé
2022-04-20 19:30     ` Peter Xu
2022-03-31 15:08 ` [PATCH v4 05/19] migration: Export ram_load_postcopy() Peter Xu
2022-04-20 10:42   ` Daniel P. Berrangé
2022-03-31 15:08 ` [PATCH v4 06/19] migration: Move channel setup out of postcopy_try_recover() Peter Xu
2022-04-20 10:43   ` Daniel P. Berrangé
2022-03-31 15:08 ` [PATCH v4 07/19] migration: Allow migrate-recover to run multiple times Peter Xu
2022-04-20 10:44   ` Daniel P. Berrangé
2022-03-31 15:08 ` [PATCH v4 08/19] migration: Add postcopy-preempt capability Peter Xu
2022-04-20 10:51   ` Daniel P. Berrangé
2022-04-20 19:31     ` Peter Xu
2022-03-31 15:08 ` [PATCH v4 09/19] migration: Postcopy preemption preparation on channel creation Peter Xu
2022-04-20 10:59   ` Daniel P. Berrangé
2022-03-31 15:08 ` [PATCH v4 10/19] migration: Postcopy preemption enablement Peter Xu
2022-04-20 11:05   ` Daniel P. Berrangé
2022-04-20 19:39     ` Peter Xu
2022-05-11 15:54   ` manish.mishra
2022-05-12 16:22     ` Peter Xu
2022-05-13 18:53       ` manish.mishra
2022-05-13 19:31         ` Peter Xu
2022-03-31 15:08 ` [PATCH v4 11/19] migration: Postcopy recover with preempt enabled Peter Xu
2022-03-31 15:08 ` [PATCH v4 12/19] migration: Create the postcopy preempt channel asynchronously Peter Xu
2022-03-31 15:08 ` [PATCH v4 13/19] migration: Parameter x-postcopy-preempt-break-huge Peter Xu
2022-03-31 15:08 ` [PATCH v4 14/19] migration: Add helpers to detect TLS capability Peter Xu
2022-04-20 11:10   ` Daniel P. Berrangé
2022-04-20 19:52     ` Peter Xu
2022-03-31 15:08 ` [PATCH v4 15/19] migration: Export tls-[creds|hostname|authz] params to cmdline too Peter Xu
2022-04-20 11:13   ` Daniel P. Berrangé
2022-04-20 20:01     ` Peter Xu
2022-03-31 15:08 ` [PATCH v4 16/19] migration: Enable TLS for preempt channel Peter Xu
2022-04-20 11:35   ` Daniel P. Berrangé
2022-04-20 20:10     ` Peter Xu
2022-03-31 15:08 ` [PATCH v4 17/19] tests: Add postcopy tls migration test Peter Xu
2022-04-20 11:39   ` Daniel P. Berrangé
2022-04-20 20:15     ` Peter Xu
2022-03-31 15:08 ` [PATCH v4 18/19] tests: Add postcopy tls recovery " Peter Xu
2022-04-20 11:42   ` Daniel P. Berrangé
2022-04-20 20:38     ` Peter Xu
2022-03-31 15:08 ` [PATCH v4 19/19] tests: Add postcopy preempt tests Peter Xu
2022-03-31 15:25   ` Peter Xu
2022-04-20 11:43   ` Daniel P. Berrangé
2022-04-20 20:51     ` Peter Xu
2022-04-21 13:57 ` Dr. David Alan Gilbert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YmFi0S36y+79HnzR@work-vm \
    --to=dgilbert@redhat.com \
    --cc=berrange@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).