From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Xiaohui Li <xiaohli@redhat.com>,
qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH v4 4/4] migration: Sync requested pages after postcopy recovery
Date: Fri, 2 Oct 2020 19:32:57 +0100 [thread overview]
Message-ID: <20201002183257.GN3286@work-vm> (raw)
In-Reply-To: <20201002175336.30858-5-peterx@redhat.com>
* Peter Xu (peterx@redhat.com) wrote:
> We synchronize the requested pages right after a postcopy recovery happens.
> This helps to synchronize the prioritized pages on source so that the faulted
> threads can be served faster.
>
> Reported-by: Xiaohui Li <xiaohli@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/savevm.c | 57 ++++++++++++++++++++++++++++++++++++++++++
> migration/trace-events | 1 +
> 2 files changed, 58 insertions(+)
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 34e4b71052..1dc021ee53 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2011,6 +2011,49 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
> return LOADVM_QUIT;
> }
>
> +/* We must be with page_request_mutex held */
> +static gboolean postcopy_sync_page_req(gpointer key, gpointer value,
> + gpointer data)
> +{
> + MigrationIncomingState *mis = data;
> + void *host_addr = (void *) key;
> + ram_addr_t rb_offset;
> + RAMBlock *rb;
> + int ret;
> +
> + rb = qemu_ram_block_from_host(host_addr, true, &rb_offset);
> + if (!rb) {
> + /*
> + * This should _never_ happen. However be nice for a migrating VM to
> + * not crash/assert. Post an error (note: intended to not use *_once
> + * because we do want to see all the illegal addresses; and this can
> + * never be triggered by the guest so we're safe) and move on next.
> + */
> + error_report("%s: illegal host addr %p", __func__, host_addr);
> + /* Try the next entry */
> + return FALSE;
> + }
> +
> + ret = migrate_send_rp_message_req_pages(mis, rb, rb_offset);
> + if (ret) {
> + /* Please refer to above comment. */
> + error_report("%s: send rp message failed for addr %p",
> + __func__, host_addr);
> + return FALSE;
> + }
> +
> + trace_postcopy_page_req_sync(host_addr);
> +
> + return FALSE;
> +}
> +
> +static void migrate_send_rp_req_pages_pending(MigrationIncomingState *mis)
> +{
> + WITH_QEMU_LOCK_GUARD(&mis->page_request_mutex) {
> + g_tree_foreach(mis->page_requested, postcopy_sync_page_req, mis);
> + }
> +}
> +
> static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> {
> if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> @@ -2033,6 +2076,20 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> /* Tell source that "we are ready" */
> migrate_send_rp_resume_ack(mis, MIGRATION_RESUME_ACK_VALUE);
>
> + /*
> + * After a postcopy recovery, the source should have lost the postcopy
> + * queue, or potentially the requested pages could have been lost during
> + * the network down phase. Let's re-sync with the source VM by re-sending
> + * all the pending pages that we eagerly need, so these threads won't get
> + * blocked too long due to the recovery.
> + *
> + * Without this procedure, the faulted destination VM threads (waiting for
> + * page requests right before the postcopy is interrupted) can keep hanging
> + * until the pages are sent by the source during the background copying of
> + * pages, or another thread faulted on the same address accidentally.
> + */
> + migrate_send_rp_req_pages_pending(mis);
> +
> return 0;
> }
>
> diff --git a/migration/trace-events b/migration/trace-events
> index e4d5eb94ca..0fbfd2da60 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -49,6 +49,7 @@ vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
> vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
> postcopy_pause_incoming(void) ""
> postcopy_pause_incoming_continued(void) ""
> +postcopy_page_req_sync(void *host_addr) "sync page req %p"
>
> # vmstate.c
> vmstate_load_field_error(const char *field, int ret) "field \"%s\" load failed, ret = %d"
> --
> 2.26.2
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2020-10-02 18:34 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-02 17:53 [PATCH v4 0/4] migration/postcopy: Sync faulted addresses after network recovered Peter Xu
2020-10-02 17:53 ` [PATCH v4 1/4] migration: Pass incoming state into qemu_ufd_copy_ioctl() Peter Xu
2020-10-02 17:53 ` [PATCH v4 2/4] migration: Introduce migrate_send_rp_message_req_pages() Peter Xu
2020-10-02 17:53 ` [PATCH v4 3/4] migration: Maintain postcopy faulted addresses Peter Xu
2020-10-02 18:31 ` Dr. David Alan Gilbert
2020-10-02 17:53 ` [PATCH v4 4/4] migration: Sync requested pages after postcopy recovery Peter Xu
2020-10-02 18:32 ` Dr. David Alan Gilbert [this message]
2020-10-07 11:41 ` [PATCH v4 0/4] migration/postcopy: Sync faulted addresses after network recovered Dr. David Alan Gilbert
2020-10-12 11:23 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201002183257.GN3286@work-vm \
--to=dgilbert@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=xiaohli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.