From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>,
qemu-devel@nongnu.org,
Leonardo Bras Soares Passos <lsoaresp@redhat.com>
Subject: Re: [PATCH v2 00/25] migration: Postcopy Preemption
Date: Wed, 2 Mar 2022 12:14:30 +0000 [thread overview]
Message-ID: <Yh9fpg0AMdL5sPXd@work-vm> (raw)
In-Reply-To: <20220301083925.33483-1-peterx@redhat.com>
* Peter Xu (peterx@redhat.com) wrote:
> This is v2 of postcopy preempt series. It can also be found here:
>
> https://github.com/xzpeter/qemu/tree/postcopy-preempt
>
> RFC: https://lore.kernel.org/qemu-devel/20220119080929.39485-1-peterx@redhat.com
> V1: https://lore.kernel.org/qemu-devel/20220216062809.57179-1-peterx@redhat.com
I've queued some of this:
tests: Pass in MigrateStart** into test_migrate_start()
migration: Add migration_incoming_transport_cleanup()
migration: postcopy_pause_fault_thread() never fails
migration: Enlarge postcopy recovery to capture !-EIO too
migration: Move static var in ram_block_from_stream() into global
migration: Add postcopy_thread_create()
migration: Dump ramblock and offset too when non-same-page detected
migration: Introduce postcopy channels on dest node
migration: Tracepoint change in postcopy-run bottom half
migration: Finer grained tracepoints for POSTCOPY_LISTEN
migration: Dump sub-cmd name in loadvm_process_command tp
> v1->v2 changelog:
> - Picked up more r-bs from Dave
> - Rename both fault threads to drop "qemu/" prefix [Dave]
> - Further rework on postcopy recovery, to be able to detect qemufile errors
> from either main channel or postcopy one [Dave]
> - shutdown() qemufile before close on src postcopy channel when postcopy is
> paused [Dave]
> - In postcopy_preempt_new_channel(), explicitly set the new channel in
> blocking state, even if it's the default [Dave]
> - Make RAMState.postcopy_channel unsigned int [Dave]
> - Added patches:
> - "migration: Create the postcopy preempt channel asynchronously"
> - "migration: Parameter x-postcopy-preempt-break-huge"
> - "migration: Add helpers to detect TLS capability"
> - "migration: Fail postcopy preempt with TLS"
> - "tests: Pass in MigrateStart** into test_migrate_start()"
>
> Abstract
> ========
>
> This series added a new migration capability called "postcopy-preempt". It can
> be enabled when postcopy is enabled, and it'll simply (but greatly) speed up
> postcopy page requests handling process.
>
> Below are some initial postcopy page request latency measurements after the
> new series applied.
>
> For each page size, I measured page request latency for three cases:
>
> (a) Vanilla: the old postcopy
> (b) Preempt no-break-huge: preempt enabled, x-postcopy-preempt-break-huge=off
> (c) Preempt full: preempt enabled, x-postcopy-preempt-break-huge=on
> (this is the default option when preempt enabled)
>
> Here x-postcopy-preempt-break-huge parameter is just added in v2 so as to
> conditionally disable the behavior to break sending a precopy huge page for
> debugging purpose. So when it's off, postcopy will not preempt precopy
> sending a huge page, but still postcopy will use its own channel.
>
> I tested it separately to give a rough idea on which part of the change
> helped how much of it. The overall benefit should be the comparison
> between case (a) and (c).
>
> |-----------+---------+-----------------------+--------------|
> | Page size | Vanilla | Preempt no-break-huge | Preempt full |
> |-----------+---------+-----------------------+--------------|
> | 4K | 10.68 | N/A [*] | 0.57 |
> | 2M | 10.58 | 5.49 | 5.02 |
> | 1G | 2046.65 | 933.185 | 649.445 |
> |-----------+---------+-----------------------+--------------|
> [*]: This case is N/A because 4K page does not contain huge page at all
>
> [1] https://github.com/xzpeter/small-stuffs/blob/master/tools/huge_vm/uffd-latency.bpf
>
> TODO List
> =========
>
> TLS support
> -----------
>
> I only noticed its missing very recently. Since soft freeze is coming, and
> obviously I'm still growing this series, so I tend to have the existing
> material discussed. Let's see if it can still catch the train for QEMU 7.0
> release (soft freeze on 2022-03-08)..
>
> Avoid precopy write() blocks postcopy
> -------------------------------------
>
> I didn't prove this, but I always think the write() syscalls being blocked
> for precopy pages can affect postcopy services. If we can solve this
> problem then my wild guess is we can further reduce the average page
> latency.
>
> Two solutions at least in mind: (1) we could have made the write side of
> the migration channel NON_BLOCK too, or (2) multi-threads on send side,
> just like multifd, but we may use lock to protect which page to send too
> (e.g., the core idea is we should _never_ rely anything on the main thread,
> multifd has that dependency on queuing pages only on main thread).
>
> That can definitely be done and thought about later.
>
> Multi-channel for preemption threads
> ------------------------------------
>
> Currently the postcopy preempt feature use only one extra channel and one
> extra thread on dest (no new thread on src QEMU). It should be mostly good
> enough for major use cases, but when the postcopy queue is long enough
> (e.g. hundreds of vCPUs faulted on different pages) logically we could
> still observe more delays in average. Whether growing threads/channels can
> solve it is debatable, but sounds worthwhile a try. That's yet another
> thing we can think about after this patchset lands.
>
> Logically the design provides space for that - the receiving postcopy
> preempt thread can understand all ram-layer migration protocol, and for
> multi channel and multi threads we could simply grow that into multile
> threads handling the same protocol (with multiple PostcopyTmpPage). The
> source needs more thoughts on synchronizations, though, but it shouldn't
> affect the whole protocol layer, so should be easy to keep compatible.
>
> Patch Layout
> ============
>
> Patch 1-3: Three leftover patches from patchset "[PATCH v3 0/8] migration:
> Postcopy cleanup on ram disgard" that I picked up here too.
>
> https://lore.kernel.org/qemu-devel/20211224065000.97572-1-peterx@redhat.com/
>
> migration: Dump sub-cmd name in loadvm_process_command tp
> migration: Finer grained tracepoints for POSTCOPY_LISTEN
> migration: Tracepoint change in postcopy-run bottom half
>
> Patch 4-9: Original postcopy preempt RFC preparation patches (with slight
> modifications).
>
> migration: Introduce postcopy channels on dest node
> migration: Dump ramblock and offset too when non-same-page detected
> migration: Add postcopy_thread_create()
> migration: Move static var in ram_block_from_stream() into global
> migration: Add pss.postcopy_requested status
> migration: Move migrate_allow_multifd and helpers into migration.c
>
> Patch 10-15: Some newly added patches when working on postcopy recovery
> support. After these patches migrate-recover command will allow re-entrance,
> which is a very nice side effect.
>
> migration: Enlarge postcopy recovery to capture !-EIO too
> migration: postcopy_pause_fault_thread() never fails
> migration: Export ram_load_postcopy()
> migration: Move channel setup out of postcopy_try_recover()
> migration: Add migration_incoming_transport_cleanup()
> migration: Allow migrate-recover to run multiple times
>
> Patch 16-19: The major work of postcopy preemption implementation is split into
> four patches as suggested by Dave.
>
> migration: Add postcopy-preempt capability
> migration: Postcopy preemption preparation on channel creation
> migration: Postcopy preemption enablement
> migration: Postcopy recover with preempt enabled
>
> Patch 20-23: Newly added patches in this v2 for different purposes.
> Majorly some amendment on existing postcopy preempt.
>
> migration: Create the postcopy preempt channel asynchronously
> migration: Parameter x-postcopy-preempt-break-huge
> migration: Add helpers to detect TLS capability
> migration: Fail postcopy preempt with TLS for now
>
> Patch 24-25: Test cases (including one more patch for cleanup)
>
> tests: Add postcopy preempt test
> tests: Pass in MigrateStart** into test_migrate_start()
>
> Please review, thanks.
>
> Peter Xu (25):
> migration: Dump sub-cmd name in loadvm_process_command tp
> migration: Finer grained tracepoints for POSTCOPY_LISTEN
> migration: Tracepoint change in postcopy-run bottom half
> migration: Introduce postcopy channels on dest node
> migration: Dump ramblock and offset too when non-same-page detected
> migration: Add postcopy_thread_create()
> migration: Move static var in ram_block_from_stream() into global
> migration: Add pss.postcopy_requested status
> migration: Move migrate_allow_multifd and helpers into migration.c
> migration: Enlarge postcopy recovery to capture !-EIO too
> migration: postcopy_pause_fault_thread() never fails
> migration: Export ram_load_postcopy()
> migration: Move channel setup out of postcopy_try_recover()
> migration: Add migration_incoming_transport_cleanup()
> migration: Allow migrate-recover to run multiple times
> migration: Add postcopy-preempt capability
> migration: Postcopy preemption preparation on channel creation
> migration: Postcopy preemption enablement
> migration: Postcopy recover with preempt enabled
> migration: Create the postcopy preempt channel asynchronously
> migration: Parameter x-postcopy-preempt-break-huge
> migration: Add helpers to detect TLS capability
> migration: Fail postcopy preempt with TLS for now
> tests: Add postcopy preempt test
> tests: Pass in MigrateStart** into test_migrate_start()
>
> migration/channel.c | 10 +-
> migration/migration.c | 235 ++++++++++++++++++++-----
> migration/migration.h | 98 ++++++++++-
> migration/multifd.c | 26 +--
> migration/multifd.h | 2 -
> migration/postcopy-ram.c | 244 +++++++++++++++++++++-----
> migration/postcopy-ram.h | 15 ++
> migration/qemu-file.c | 27 +++
> migration/qemu-file.h | 1 +
> migration/ram.c | 330 +++++++++++++++++++++++++++++++----
> migration/ram.h | 3 +
> migration/savevm.c | 70 ++++++--
> migration/socket.c | 22 ++-
> migration/socket.h | 1 +
> migration/trace-events | 19 +-
> qapi/migration.json | 8 +-
> tests/qtest/migration-test.c | 68 ++++++--
> 17 files changed, 983 insertions(+), 196 deletions(-)
>
> --
> 2.32.0
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2022-03-02 13:33 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-01 8:39 [PATCH v2 00/25] migration: Postcopy Preemption Peter Xu
2022-03-01 8:39 ` [PATCH v2 01/25] migration: Dump sub-cmd name in loadvm_process_command tp Peter Xu
2022-03-01 8:39 ` [PATCH v2 02/25] migration: Finer grained tracepoints for POSTCOPY_LISTEN Peter Xu
2022-03-01 8:39 ` [PATCH v2 03/25] migration: Tracepoint change in postcopy-run bottom half Peter Xu
2022-03-01 8:39 ` [PATCH v2 04/25] migration: Introduce postcopy channels on dest node Peter Xu
2022-03-01 8:39 ` [PATCH v2 05/25] migration: Dump ramblock and offset too when non-same-page detected Peter Xu
2022-03-01 8:39 ` [PATCH v2 06/25] migration: Add postcopy_thread_create() Peter Xu
2022-03-01 8:39 ` [PATCH v2 07/25] migration: Move static var in ram_block_from_stream() into global Peter Xu
2022-03-01 8:39 ` [PATCH v2 08/25] migration: Add pss.postcopy_requested status Peter Xu
2022-03-01 8:39 ` [PATCH v2 09/25] migration: Move migrate_allow_multifd and helpers into migration.c Peter Xu
2022-03-01 8:39 ` [PATCH v2 10/25] migration: Enlarge postcopy recovery to capture !-EIO too Peter Xu
2022-03-01 8:39 ` [PATCH v2 11/25] migration: postcopy_pause_fault_thread() never fails Peter Xu
2022-03-01 8:39 ` [PATCH v2 12/25] migration: Export ram_load_postcopy() Peter Xu
2022-03-01 8:39 ` [PATCH v2 13/25] migration: Move channel setup out of postcopy_try_recover() Peter Xu
2022-03-01 8:39 ` [PATCH v2 14/25] migration: Add migration_incoming_transport_cleanup() Peter Xu
2022-03-01 8:39 ` [PATCH v2 15/25] migration: Allow migrate-recover to run multiple times Peter Xu
2022-03-01 8:39 ` [PATCH v2 16/25] migration: Add postcopy-preempt capability Peter Xu
2022-03-01 8:39 ` [PATCH v2 17/25] migration: Postcopy preemption preparation on channel creation Peter Xu
2022-03-01 8:39 ` [PATCH v2 18/25] migration: Postcopy preemption enablement Peter Xu
2022-03-01 8:39 ` [PATCH v2 19/25] migration: Postcopy recover with preempt enabled Peter Xu
2022-03-01 8:39 ` [PATCH v2 20/25] migration: Create the postcopy preempt channel asynchronously Peter Xu
2022-03-01 8:39 ` [PATCH v2 21/25] migration: Parameter x-postcopy-preempt-break-huge Peter Xu
2022-03-01 8:39 ` [PATCH v2 22/25] migration: Add helpers to detect TLS capability Peter Xu
2022-03-01 8:39 ` [PATCH v2 23/25] migration: Fail postcopy preempt with TLS for now Peter Xu
2022-03-01 8:39 ` [PATCH v2 24/25] tests: Add postcopy preempt test Peter Xu
2022-03-01 8:39 ` [PATCH v2 25/25] tests: Pass in MigrateStart** into test_migrate_start() Peter Xu
2022-03-02 12:11 ` Dr. David Alan Gilbert
2022-03-01 9:25 ` [PATCH v2 00/25] migration: Postcopy Preemption Daniel P. Berrangé
2022-03-01 10:17 ` Peter Xu
2022-03-01 10:27 ` Daniel P. Berrangé
2022-03-01 10:55 ` Peter Xu
2022-03-01 16:51 ` Dr. David Alan Gilbert
2022-03-02 1:46 ` Peter Xu
2022-03-14 18:49 ` Time to introduce a migration protocol negotiation (Re: [PATCH v2 00/25] migration: Postcopy Preemption) Daniel P. Berrangé
2022-03-15 6:13 ` Peter Xu
2022-03-15 11:15 ` Daniel P. Berrangé
2022-03-16 3:30 ` Peter Xu
2022-03-16 9:59 ` Daniel P. Berrangé
2022-03-16 10:40 ` Peter Xu
2022-03-16 11:00 ` Daniel P. Berrangé
2022-03-18 7:08 ` Peter Xu
2022-03-15 10:43 ` Dr. David Alan Gilbert
2022-03-15 11:05 ` Daniel P. Berrangé
2022-03-01 18:05 ` [PATCH v2 00/25] migration: Postcopy Preemption Daniel P. Berrangé
2022-03-02 1:48 ` Peter Xu
2022-03-02 12:14 ` Dr. David Alan Gilbert [this message]
2022-03-02 12:34 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yh9fpg0AMdL5sPXd@work-vm \
--to=dgilbert@redhat.com \
--cc=lsoaresp@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).