qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Thomas Huth <thuth@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	peterx@redhat.com, Juan Quintela <quintela@redhat.com>
Subject: [PATCH v6 0/6] migration/postcopy: Sync faulted addresses after network recovered
Date: Wed, 21 Oct 2020 17:27:15 -0400	[thread overview]
Message-ID: <20201021212721.440373-1-peterx@redhat.com> (raw)

v6:
- fix page mask to use ramblock psize [Dave]

v5:
- added one test patch for easier debugging for migration-test
- added one fix patch [1] for another postcopy race
- fixed a bug that could trigger when host/guest page size differs

v4:
- use "void */ulong" instead of "uint64_t" where proper in patch 3/4 [Dave]

v3:
- fix build on 32bit hosts & rebase
- remove r-bs for the last 2 patches for Dave due to the changes

v2:
- add r-bs for Dave
- add patch "migration: Properly destroy variables on incoming side" as patch 1
- destroy page_request_mutex in migration_incoming_state_destroy() too [Dave]
- use WITH_QEMU_LOCK_GUARD in two places where we can [Dave]

We've seen conditional guest hangs on destination VM after postcopy recovered.
However the hang will resolve itself after a few minutes.

The problem is: after a postcopy recovery, the prioritized postcopy queue on
the source VM is actually missing.  So all the faulted threads before the
postcopy recovery happened will keep halted until (accidentally) the page got
copied by the background precopy migration stream.

The solution is to also refresh this information after postcopy recovery.  To
achieve this, we need to maintain a list of faulted addresses on the
destination node, so that we can resend the list when necessary.  This work is
done via patch 2-5.

With that, the last thing we need to do is to send this extra information to
source VM after recovered.  Very luckily, this synchronization can be
"emulated" by sending a bunch of page requests (although these pages have been
sent previously!) to source VM just like when we've got a page fault.  Even in
the 1st version of the postcopy code we'll handle duplicated pages well.  So
this fix does not even need a new capability bit and it'll work smoothly on old
QEMUs when we migrate from them to the new QEMUs.

Please review, thanks.

Peter Xu (6):
  migration: Pass incoming state into qemu_ufd_copy_ioctl()
  migration: Introduce migrate_send_rp_message_req_pages()
  migration: Maintain postcopy faulted addresses
  migration: Sync requested pages after postcopy recovery
  migration/postcopy: Release fd before going into 'postcopy-pause'
  migration-test: Only hide error if !QTEST_LOG

 migration/migration.c        | 55 ++++++++++++++++++++++++++++++----
 migration/migration.h        | 21 ++++++++++++-
 migration/postcopy-ram.c     | 25 ++++++++++++----
 migration/savevm.c           | 57 ++++++++++++++++++++++++++++++++++++
 migration/trace-events       |  3 ++
 tests/qtest/migration-test.c |  6 +++-
 6 files changed, 154 insertions(+), 13 deletions(-)

-- 
2.26.2




             reply	other threads:[~2020-10-21 21:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-21 21:27 Peter Xu [this message]
2020-10-21 21:27 ` [PATCH v6 1/6] migration: Pass incoming state into qemu_ufd_copy_ioctl() Peter Xu
2020-10-21 21:27 ` [PATCH v6 2/6] migration: Introduce migrate_send_rp_message_req_pages() Peter Xu
2020-10-21 21:27 ` [PATCH v6 3/6] migration: Maintain postcopy faulted addresses Peter Xu
2020-10-22 16:46   ` Dr. David Alan Gilbert
2020-10-21 21:27 ` [PATCH v6 4/6] migration: Sync requested pages after postcopy recovery Peter Xu
2020-10-21 21:27 ` [PATCH v6 5/6] migration/postcopy: Release fd before going into 'postcopy-pause' Peter Xu
2020-10-21 21:27 ` [PATCH v6 6/6] migration-test: Only hide error if !QTEST_LOG Peter Xu
2020-10-22  5:34   ` Thomas Huth
2020-10-26 13:57 ` [PATCH v6 0/6] migration/postcopy: Sync faulted addresses after network recovered Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201021212721.440373-1-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).