From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Xen-devel List <xen-devel@lists.xen.org>
Cc: Ian Campbell <Ian.Campbell@citrix.com>, Tim Deegan <tim@xen.org>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
Paul Durrant <Paul.Durrant@citrix.com>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>
Subject: Migration memory corruption - PV backends need to quiesce
Date: Fri, 27 Jun 2014 17:51:25 +0100 [thread overview]
Message-ID: <53ADA10D.8030206@citrix.com> (raw)
Hello,
After a long time fixing my own memory corruption bugs with migration
v2, I have finally tracked down (what I really really hope is) the last
of the corruption.
There appears to be a systematic problem affecting all PV drivers,
whereby a non-quiescent backend can cause memory corruption in the VM.
Active grant mapped pages are only reflected in the dirty bitmap after
the grant has been unmapped, as mapping the ring read-only would be
catastrophic to performance, and remapping as read-only when logdirty is
enabled is (as far as I understand) impossible, as Xen doesn't track the
PTEs pointing at granted frames.
PV backend drivers hold their mappings of the rings (and persistently
granted frames) open until the domain is destroyed, which is after the
memory image has been sent. Therefore, any requests which are processed
after the migration code sending the ring frame on its first pass will
not be reflected in the resumed domain, as this frame will never be
marked as dirty in Xen.
Furthermore, as the migration code uses memcpy() on the frames, it is
possible that a backed update intersects with the copy, and a corrupt
descriptor appears on the resumed side.
In addition, after the domain has been paused, the backend might still
process requests. The migration code excepts the guest be completely
quiesced after it has been suspended, so will only check the dirty
bitmap once. Any requests which get processed and completed might still
be missed by the migration code.
>From a heavily instrumented Xen and migration code, I am fairly sure I
have confirmed that all pages corrupted on migration are a result of
still-active grant maps, grant copies which complete after domain
suspend, or the xenstore ring which xenstored has a magic mapping of,
and will never be reflected in the dirty bitmap.
Overall, it would appear that there needs to be a hook for all PV
drivers to force quiescence. In particular, a backend must guarantee to
unmap all active grant maps (so the frames get properly reflected in the
dirty bitmap), and never process subsequent requests (so no new frames
appear dirty in the bitmap after the guest has been paused).
Thoughts/comments?
~Andrew
next reply other threads:[~2014-06-27 16:51 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-27 16:51 Andrew Cooper [this message]
2014-06-27 17:28 ` Migration memory corruption - PV backends need to quiesce David Vrabel
2014-06-27 18:15 ` Tim Deegan
2014-06-27 18:37 ` Andrew Cooper
2014-06-30 8:38 ` Ian Campbell
2014-06-30 9:02 ` Andrew Cooper
2014-06-30 9:21 ` Ian Campbell
2014-06-30 9:46 ` Andrew Cooper
2014-06-30 9:52 ` Ian Campbell
2014-06-30 10:13 ` Andrew Cooper
2014-06-30 9:47 ` David Vrabel
2014-06-30 9:53 ` Ian Campbell
2014-07-01 10:29 ` David Vrabel
2014-07-02 10:02 ` Ian Campbell
2014-07-02 10:03 ` David Vrabel
2014-06-30 10:14 ` Tim Deegan
2014-06-30 10:24 ` Ian Campbell
2014-06-30 10:52 ` Tim Deegan
2014-06-30 11:07 ` Ian Campbell
2014-06-30 11:12 ` Tim Deegan
2014-06-30 11:57 ` David Vrabel
2014-06-30 12:20 ` Ian Campbell
2014-06-30 11:01 ` Ian Campbell
2014-06-30 11:08 ` Tim Deegan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53ADA10D.8030206@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=Paul.Durrant@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.