From: Ian Campbell <Ian.Campbell@citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Juergen Gross <JGross@suse.com>, Wei Liu <wei.liu2@citrix.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
Xen-devel List <xen-devel@lists.xen.org>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
Shriram Rajagopalan <rshriram@cs.ubc.ca>,
Hongyang Yang <yanghy@cn.fujitsu.com>
Subject: Re: Buggy interaction of live migration and p2m updates
Date: Fri, 21 Nov 2014 09:43:10 +0000 [thread overview]
Message-ID: <1416562990.26869.10.camel@citrix.com> (raw)
In-Reply-To: <546E32BB.8090909@citrix.com>
On Thu, 2014-11-20 at 18:28 +0000, Andrew Cooper wrote:
> Realistically, this means no updates to the
> p2m at all, due to several potential race conditions.
>From the rest of the mail it seems as if you are talking primarily about
changes to the p2m *structure*, i.e. which guest frames contain the p2m
pages, rather than changes to the p2m entries themselves. Is that
correct?
I don't see any (explicit) mention of the pfn_to_mfn_frame_list_list
here, where does that fit in?
> As far as these issues are concerned, there are two distinct p2m
> modifications which we care about:
> 1) p2m structure changes (rearranging the layout of the p2m)
> 2) p2m content changes (altering entries in the p2m)
>
> There is no possible way for the toolstack to prevent a domain from
> altering its p2m. At the moment, ballooning typically only occurs when
> requested by the toolstack, but the underlying operations
> (increase/decrease_reservation, mem_exchange, etc) can be used by the
> guest at any point. This includes Wei's guest memory fragmentation
> changes. Changes to the content of the p2m also occur for grant map and
> unmap operations.
>
>
> Currently in PV guests, the p2m is implemented using a 3-level tree,
> with its root in the guests shared_info page. It provides a hard VM
> memory limit of 4TB for 32bit PV guests (which is far higher than the
> 128GB limit from the compat p2m mappings), or 512GB for 64bit PV guests.
>
> Juergen has a proposed new p2m interface using a virtual linear
> mapping. This is conceptually similar to the previous implementation
> (which is fine from the toolstacks point of view), but far less
> complicated from the guests point of view, and removes the memory limits
> imposed by the p2m structure.
>
> The new virtual linear mapping suffers from the same interaction issues
> as the old 3-level tree did, but the introduction of the new interface
> affords us an opportunity to make all API modifications at once to
> reduce churn.
>
>
> During live migration, the toolstack maps the guests p2m into a linear
> mapping in the toolstacks virtual address space. This is done once at
> the start of migration, and never subsequently altered. During live
> migration, the p2m is cross-verified with the m2p, and frames are sent
> using pfns as a reference, as they will be located in different frames
> on the receiving side.
>
> Should the guest change the p2m structure during live migration, the
> toolstack ends up with a stale p2m with a non-p2m frame in the middle,
> resulting in bogus cross-referencing. Should the guest change an entry
> in the p2m, the p2m frame itself will be resent as it would be marked as
> dirty in the logdirty bitmap, but the target pfn will remain unsent and
> probably stale on the receiving side.
>
>
> Another factor which needs to be taken into account is Remus/COLO, which
> run the domains under live migration conditions for the duration of
> their lifetime.
>
> During the live part of migration, the toolstack already has to be able
> to tolerate failures to normalise the pagetables, which result as a
> consequent of the pagetables being in active. These failures are fatal
> on the final iteration after the guest has been paused, but the same
> logic could be extended to p2m/m2p issues, if needed.
>
>
> There are several potential solutions to these problems.
>
> 1) Freeze the guests p2m during live migrate
>
> This is the simplest sounding option, but is quite problematic from the
> point of view of the guest. It is essentially a shared spinlock between
> the toolstack and the guest kernel. It would prevent any grant
> map/unmap operations from occurring, and might interact badly with
> certain p2m updated in the guest which would previously be expected to
> unconditionally succeed.
>
> Pros) (Can't think of any)
> Cons) Not easy to implement (even conceptually), requires invasive guest
> changes, will cripple Remus/COLO
>
>
> 2) Deep p2m dirty tracking
>
> In the case that a p2m frame is discovered dirty in the logdirty bitmap,
> we can be certain that a write has occurred to it, and in the common
> case, means that the mapping has changed. The toolstack could maintain
> a non-live copy of the p2m which is updated as new frames are sent.
> When a dirty p2m frame is found, the live and non-live copies can be
> consulted to find which pfn mappings have changed, and locally mark all
> the altered pfns for retransmit.
>
> Pros) No guest changes required
> Cons) Toolstack needs to keep an additional copy of the guests p2m on
> the sending side
>
> 3) Eagerly check for p2m structure changes.
>
> p2m structure changes are rare after boot, but not impossible. Each
> iteration of live migration, the toolstack can check for dirty
> higher-level p2m frames in the dirty bitmap. In the case that a
> structure update occurs, the toolstack can use information it already
> has to calculate a subset of pfns affected by the update, and mark them
> for resending. (This can currently be done to the frame granularity
> given the p2m frame lit, but in combination with 2), could result in
> fewer pfns needing resending.)
>
> Pros) No guest changes required.
> Cons) Moderately high toolstack overhead, Possibility to resend far
> more pfns than strictly required.
>
> 4) Request p2m structure change updates from the guest
>
> The guest could provide a "p2m generation count" to allow the toolstack
> to evaluate whether the structure had changed. This would allow the
> live part of migration to periodically re-evaluate whether it should
> remap the p2m to avoid stale mappings.
>
> Pros) Easy to implement alongside the virtual linear mapping support.
> Easy for toolstack and guest
> Cons) Only works with new virtual linear guests.
>
>
> Proposed solution: A combination of 2, 3 and 4.
>
> For legacy 3-level p2m guests, the toolstack can detect p2m structure
> updates by tracking the p2m top and mid levels in the logdirty bitmap,
> and invalidating the modified subset of pfns. It has to eagerly check
> the p2m frame list list mfn entry in the shared info to see whether the
> guest has swapped onto a completely new p2m.
>
> For a virtual linear map, the intermediate levels are not available to
> track, but we can require that the guest increment p2m generation clock
> in the shared info. When the structure changes, the toolstack can remap
> the p2m and calculate the altered subset of pfns, and mark for resend.
>
> The toolstack must also track changes in the p2m itself, and compare to
> a local copy showing the mapping at the time at which the pfn was last
> sent. This can be used to work out which p2m mappings have changed, and
> also be used to confirm whether the pfns on the receiving side are stale
> or not.
>
> I believe this covered all cases and race conditions. In the case that
> the p2m is updated before the m2p, the p2m frame will be marked dirty in
> the bitmap, and discoverable on the next iteration. At that point, if
> the p2m and m2p are inconsistent, the pfn will be deferred until the
> final iteration. If not, the frame is sent and everything is all ok.
> In the case that the p2m is updated after the m2p, the p2m/m2p will be
> consistent when the dirty bitmap is acted on.
>
>
> Thoughts? (for anyone who has made it this far :) I think I have
> covered everything.)
>
> ~Andrew
>
next prev parent reply other threads:[~2014-11-21 9:43 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-20 18:28 Buggy interaction of live migration and p2m updates Andrew Cooper
2014-11-21 5:41 ` Juergen Gross
2014-11-21 10:32 ` Andrew Cooper
2014-11-27 15:14 ` Tim Deegan
2014-11-21 9:43 ` Ian Campbell [this message]
2014-11-21 10:24 ` Andrew Cooper
2014-11-21 10:46 ` Ian Campbell
2014-11-21 11:07 ` Andrew Cooper
2014-11-21 11:15 ` Ian Campbell
2014-11-21 11:20 ` Juergen Gross
2014-11-21 11:24 ` Ian Campbell
2014-11-21 12:15 ` Jan Beulich
2014-11-21 12:20 ` Jürgen Groß
2014-11-21 10:43 ` Jan Beulich
2014-11-21 10:54 ` Andrew Cooper
2014-11-27 15:00 ` Tim Deegan
2014-11-27 15:16 ` Andrew Cooper
2014-11-27 15:28 ` Tim Deegan
2014-11-27 15:41 ` Andrew Cooper
2014-12-01 14:38 ` David Vrabel
2014-12-01 16:58 ` Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1416562990.26869.10.camel@citrix.com \
--to=ian.campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=JGross@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=rshriram@cs.ubc.ca \
--cc=tim@xen.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
--cc=yanghy@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.