From: Joshua Otto <jtotto@uwaterloo.ca>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
xen-devel@lists.xenproject.org
Cc: ian.jackson@eu.citrix.com, hjarmstr@uwaterloo.ca,
wei.liu2@citrix.com, czylin@uwaterloo.ca, imhy.yang@gmail.com
Subject: Re: [PATCH RFC 00/20] Add postcopy live migration support
Date: Fri, 31 Mar 2017 00:51:46 -0400 [thread overview]
Message-ID: <20170331045136.GA2415@eagle> (raw)
In-Reply-To: <e5258f60-12bc-eb11-4203-25bcc02513f9@citrix.com>
On Wed, Mar 29, 2017 at 11:50:52PM +0100, Andrew Cooper wrote:
> On 27/03/2017 10:06, Joshua Otto wrote:
> > Hi,
> >
> > We're a team of three fourth-year undergraduate software engineering students at
> > the University of Waterloo in Canada. In late 2015 we posted on the list [1] to
> > ask for a project to undertake for our program's capstone design project, and
> > Andrew Cooper pointed us in the direction of the live migration implementation
> > as an area that could use some attention. We were particularly interested in
> > post-copy live migration (as evaluated by [2] and discussed on the list at [3]),
> > and have been working on an implementation of this on-and-off since then.
> >
> > We now have a working implementation of this scheme, and are submitting it for
> > comment. The changes are also available as the 'postcopy' branch of the GitHub
> > repository at [4]
> >
> > As a brief overview of our approach:
> > - We introduce a mechanism by which libxl can indicate to the libxc stream
> > helper process that the iterative migration precopy loop should be terminated
> > and postcopy should begin.
> > - At this point, we suspend the domain, collect the final set of dirty pfns and
> > write these pfns (and _not_ their contents) into the stream.
> > - At the destination, the xc restore logic registers itself as a pager for the
> > migrating domain, 'evicts' all of the pfns indicated by the sender as
> > outstanding, and then resumes the domain at the destination.
> > - As the domain executes, the migration sender continues to push the remaining
> > oustanding pages to the receiver in the background. The receiver
> > monitors both the stream for incoming page data and the paging ring event
> > channel for page faults triggered by the guest. Page faults are forwarded on
> > the back-channel migration stream to the migration sender, which prioritizes
> > these pages for transmission.
> >
> > By leveraging the existing paging API, we are able to implement the postcopy
> > scheme without any hypervisor modifications - all of our changes are confined to
> > the userspace toolstack. However, we inherit from the paging API the
> > requirement that the domains be HVM and that the host have HAP/EPT support.
>
> Wow. Considering that the paging API has had no in-tree consumers (and
> its out-of-tree consumer folded), I am astounded that it hasn't bitrotten.
Well, there's tools/xenpaging, which was a helpful reference when
putting this together. The user-space pager actually has rotted a bit
(I'm fairly certain the VM event ring protocol has changed subtly under
its feet), so I also needed to consult tools/xen-access to get things
right.
>
> >
> > We haven't yet had the opportunity to perform a quantitative evaluation of the
> > performance trade-offs between the traditional pre-copy and our post-copy
> > strategies, but intend to. Informally, we've been testing our implementation by
> > migrating a domain running the x86 memtest program (which is obviously a
> > tremendously write-heavy workload), and have observed a substantial reduction in
> > total time required for migration completion (at the expense of a visually
> > obvious 'slowdown' in the execution of the program).
>
> Do you have any numbers, even for this informal testing?
We have a much more ambitious test matrix planned, but sure, here's an
early encouraging set of measurements - for a domain with 2GB of memory
and a 256MB writable working set (the application driving the writes
being fio submitting writes against a ramdisk), we measured these times:
Pre-copy + Stop-and-copy | 1 precopy iteration +
(s) | postcopy (s)
--------------------------+-------------------------
Precopy Duration: 66.97 | 44.44
Suspend Duration: 6.807 | 3.23
Postcopy Duration: N/A | 4.83
However...
That 3.23s suspend for the hybrid migration seems too high, doesn't it?
There's currently a serious performance bug that we're still trying to
work out in the case of pure-postcopy migrations, with no leading
precopy. Attempting a pure postcopy migration when running the
experiment above yields:
Pure postcopy (s)
----------------------
Precopy Duration: 0
Suspend Duration: 21.93
Postcopy Duration: 44.22
Although the postcopy scheme clearly works, it takes 21.93s (!) to
unpause the guest at the destination. The eviction of the unmigrated
pages completes in a second or two because of the lack of batching
support (still bad, but not this bad) - the holdup is somewhere on the
domain creation sequence between domcreate_stream_done() and
domcreate_complete().
I suspect that this is the result of a bad interaction between QEMU's
startup sequence (its foreign memory mapping behaviour in particular)
and the postcopy paging. Specifically: the paging ring has room only
for 8 requests at a time. When QEMU attempts to map a large range, the
range gets postcopy-faulted over synchronously in batches of 8 pages at
a time, and each such batch implies a synchronous copy of its pages
over the network (and the 100us xenforeignmemory_map() retry timer)
before the next batch can begin.
If I am able to confirm that this is the case, a sensible solution would
seem to be supporting paging range-population requests (i.e. a new
paging ring request type for a _range_ of gfns). In the mean time, you
should expect to observe this effect as well in experiments. It appears
to be largely (but not completely) mitigated by performing a single
pre-copy iteration first.
>
> > We've also noticed that,
> > when performing a postcopy without any leading precopy iterations, the time
> > required at the destination to 'evict' all of the outstanding pages is
> > substantial - possibly because there is no batching mechanism by which pages can
> > be evicted - so this area in particular might require further attention.
> >
> > We're really interested in any feedback you might have!
>
> Do you have a design document for this? The spec modifications and code
> comments are great, but there is no substitute (as far as understanding
> goes) for a description in terms of the algorithm and design choices.
As I replied to Wei, not yet, but we'd happily prepare one for v2.
Thanks!
Josh
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-03-31 4:52 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-27 9:06 [PATCH RFC 00/20] Add postcopy live migration support Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 01/20] tools: rename COLO 'postcopy' to 'aftercopy' Joshua Otto
2017-03-28 16:34 ` Wei Liu
2017-04-11 6:19 ` Zhang Chen
2017-03-27 9:06 ` [PATCH RFC 02/20] libxc/xc_sr: parameterise write_record() on fd Joshua Otto
2017-03-28 18:53 ` Andrew Cooper
2017-03-31 14:19 ` Wei Liu
2017-03-27 9:06 ` [PATCH RFC 03/20] libxc/xc_sr_restore.c: use write_record() in send_checkpoint_dirty_pfn_list() Joshua Otto
2017-03-28 18:56 ` Andrew Cooper
2017-03-31 14:19 ` Wei Liu
2017-03-27 9:06 ` [PATCH RFC 04/20] libxc/xc_sr_save.c: add WRITE_TRIVIAL_RECORD_FN() Joshua Otto
2017-03-28 19:03 ` Andrew Cooper
2017-03-30 4:28 ` Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 05/20] libxc/xc_sr: factor out filter_pages() Joshua Otto
2017-03-28 19:27 ` Andrew Cooper
2017-03-30 4:42 ` Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 06/20] libxc/xc_sr: factor helpers out of handle_page_data() Joshua Otto
2017-03-28 19:52 ` Andrew Cooper
2017-03-30 4:49 ` Joshua Otto
2017-04-12 15:16 ` Wei Liu
2017-03-27 9:06 ` [PATCH RFC 07/20] migration: defer precopy policy to libxl Joshua Otto
2017-03-29 18:54 ` Jennifer Herbert
2017-03-30 5:28 ` Joshua Otto
2017-03-29 20:18 ` Andrew Cooper
2017-03-30 5:19 ` Joshua Otto
2017-04-12 15:16 ` Wei Liu
2017-04-18 17:56 ` Ian Jackson
2017-03-27 9:06 ` [PATCH RFC 08/20] libxl/migration: add precopy tuning parameters Joshua Otto
2017-03-29 21:08 ` Andrew Cooper
2017-03-30 6:03 ` Joshua Otto
2017-04-12 15:37 ` Wei Liu
2017-04-27 22:51 ` Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 09/20] libxc/xc_sr_save: introduce save batch types Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 10/20] libxc/xc_sr_save.c: initialise rec.data before free() Joshua Otto
2017-03-28 19:59 ` Andrew Cooper
2017-03-29 17:47 ` Wei Liu
2017-03-27 9:06 ` [PATCH RFC 11/20] libxc/migration: correct hvm record ordering specification Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 12/20] libxc/migration: specify postcopy live migration Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 13/20] libxc/migration: add try_read_record() Joshua Otto
2017-04-12 15:16 ` Wei Liu
2017-03-27 9:06 ` [PATCH RFC 14/20] libxc/migration: implement the sender side of postcopy live migration Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 15/20] libxc/migration: implement the receiver " Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 16/20] libxl/libxl_stream_write.c: track callback chains with an explicit phase Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 17/20] libxl/libxl_stream_read.c: " Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 18/20] libxl/migration: implement the sender side of postcopy live migration Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 19/20] libxl/migration: implement the receiver " Joshua Otto
2017-03-27 9:06 ` [PATCH RFC 20/20] tools: expose postcopy live migration support in libxl and xl Joshua Otto
2017-03-28 14:41 ` [PATCH RFC 00/20] Add postcopy live migration support Wei Liu
2017-03-30 4:13 ` Joshua Otto
2017-03-31 14:19 ` Wei Liu
2017-03-29 22:50 ` Andrew Cooper
2017-03-31 4:51 ` Joshua Otto [this message]
2017-04-12 15:38 ` Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170331045136.GA2415@eagle \
--to=jtotto@uwaterloo.ca \
--cc=andrew.cooper3@citrix.com \
--cc=czylin@uwaterloo.ca \
--cc=hjarmstr@uwaterloo.ca \
--cc=ian.jackson@eu.citrix.com \
--cc=imhy.yang@gmail.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).