* [PATCH 0/6] [VERY RFC] Migration Stream v2
@ 2014-04-09 18:28 Andrew Cooper
2014-04-09 18:28 ` [PATCH 1/6] [HACK] tools/libxc: save/restore v2 framework Andrew Cooper
` (7 more replies)
0 siblings, 8 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-09 18:28 UTC (permalink / raw)
To: Xen-devel
Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson, Tim Deegan,
Frediano Ziglio, David Vrabel, Jan Beulich
Hello,
Presented here for early review is a basic implementation of PV guest
migration using the v2 stream format.
PV non-live migration is believed-working; i.e. xl save/restore.
One caveat is 32bit PV domains and 64 bit toolstacks, which is expected not to
work currently. There is an architectural problem when using the toolstack
domains m2p to shoot Xen mappings from a PV guest, which is hidden by another
over-aggressive bug in the live part of v1 migration, which is why v1 currently
works (albeit with a risk of shooting too many guest PTEs). As 'live' is not
yet implemented in v2, the second bug has not been replicated.
The code has been a clean rewrite, using the v1 code as a reference but
avoiding obsolete areas (e.g. how to modify the pagetables of a 32 non-pae
guest on 32bit pae Xen).
Some design decisions have been take very deliberately (e.g. splitting the
logic for PV and hvm migration) while others have been more along the lines of
"I think its a sensible thing to do given a lack of any evidence/opinion to
the contrary".
The error handling is known to only semi-consistent. Functions return 0 for
success and non-zero for failure. This is typically -1, although errno is not
always relevant. However, the logging messages should all be relevant and
correct. Making this properly consistent will involve wider effort across all
of libxc.
Patches:
* 1 is a gross hack to allow the two versions to coexist
* 2 is mainly a header file following the specification (draft E)
* 3 is a set of python scripts for validation of streams
* 4 is some common PV code
* 5 is an implementation of PV save
* 6 is an implementation of PV restore
The rough order of forthcoming work is:
* Fix architectural bug (new hypercall required)
* Get live migration working without the risk of corrupting 32bit guests
* Get HVM migration working (conceptually easier)
* Get some of the optional features working (tmem blobs, etc)
An area needing discussing is how to do v1 -> v2 transformations for a one-time
upgrade. There is a (very basic currently) python script which can pick a v1
stream, and a separate python library to write v2 streams.
One option would be to combine these two into a program which takes two fds,
which libxc can exec() out to. There is deliberate flexibility in the v2
restore code which allows a v1 -> v2 transformation on a stream without seeking.
Anyway - the code is presented for initial comment/query/critisixm.
~Andrew
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/6] [HACK] tools/libxc: save/restore v2 framework
2014-04-09 18:28 [PATCH 0/6] [VERY RFC] Migration Stream v2 Andrew Cooper
@ 2014-04-09 18:28 ` Andrew Cooper
2014-04-09 18:28 ` [PATCH 2/6] tools/libxc: Stream specification and some common code Andrew Cooper
` (6 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-09 18:28 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper
For testing purposes, the environmental variable "XG_MIGRATION_V2" allows the
two save/restore codepaths to coexist, and have a runtime switch.
It is indended that once this series is less RFC, the v2 framework will
completely replace v1.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
tools/libxc/Makefile | 1 +
tools/libxc/saverestore/common.h | 15 +++++++++++++++
tools/libxc/saverestore/restore.c | 23 +++++++++++++++++++++++
tools/libxc/saverestore/save.c | 20 ++++++++++++++++++++
tools/libxc/xc_domain_restore.c | 8 ++++++++
tools/libxc/xc_domain_save.c | 7 +++++++
tools/libxc/xenguest.h | 14 ++++++++++++++
7 files changed, 88 insertions(+)
create mode 100644 tools/libxc/saverestore/common.h
create mode 100644 tools/libxc/saverestore/restore.c
create mode 100644 tools/libxc/saverestore/save.c
diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 2cca2b2..5fd5cb5 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -44,6 +44,7 @@ GUEST_SRCS-y :=
GUEST_SRCS-y += xg_private.c xc_suspend.c
ifeq ($(CONFIG_MIGRATE),y)
GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c
+GUEST_SRCS-y += $(wildcard saverestore/*.c)
GUEST_SRCS-y += xc_offline_page.c xc_compression.c
else
GUEST_SRCS-y += xc_nomigrate.c
diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
new file mode 100644
index 0000000..f1aff44
--- /dev/null
+++ b/tools/libxc/saverestore/common.h
@@ -0,0 +1,15 @@
+#ifndef __COMMON__H
+#define __COMMON__H
+
+#include "../xg_private.h"
+
+#endif
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/saverestore/restore.c b/tools/libxc/saverestore/restore.c
new file mode 100644
index 0000000..6624baa
--- /dev/null
+++ b/tools/libxc/saverestore/restore.c
@@ -0,0 +1,23 @@
+#include "common.h"
+
+int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
+ unsigned int store_evtchn, unsigned long *store_mfn,
+ domid_t store_domid, unsigned int console_evtchn,
+ unsigned long *console_mfn, domid_t console_domid,
+ unsigned int hvm, unsigned int pae, int superpages,
+ int checkpointed_stream,
+ struct restore_callbacks *callbacks)
+{
+ IPRINTF("In experimental %s", __func__);
+ return -1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
new file mode 100644
index 0000000..c013e62
--- /dev/null
+++ b/tools/libxc/saverestore/save.c
@@ -0,0 +1,20 @@
+#include "common.h"
+
+int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
+ uint32_t max_factor, uint32_t flags,
+ struct save_callbacks* callbacks, int hvm,
+ unsigned long vm_generationid_addr)
+{
+ IPRINTF("In experimental %s", __func__);
+ return -1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index bcb0ae0..faa458a 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1468,6 +1468,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
struct restore_ctx *ctx = &_ctx;
struct domain_info_context *dinfo = &ctx->dinfo;
+ if ( getenv("XG_MIGRATION_V2") )
+ {
+ return xc_domain_restore2(
+ xch, io_fd, dom, store_evtchn, store_mfn,
+ store_domid, console_evtchn, console_mfn, console_domid,
+ hvm, pae, superpages, checkpointed_stream, callbacks);
+ }
+
DPRINTF("%s: starting restore of new domid %u", __func__, dom);
pagebuf_init(&pagebuf);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 71f9b59..c94e3e6 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -895,6 +895,13 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
int completed = 0;
+ if ( getenv("XG_MIGRATION_V2") )
+ {
+ return xc_domain_save2(xch, io_fd, dom, max_iters,
+ max_factor, flags, callbacks, hvm,
+ vm_generationid_addr);
+ }
+
DPRINTF("%s: starting save of domid %u", __func__, dom);
if ( hvm && !callbacks->switch_qemu_logdirty )
diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index 1f216cd..b80df82 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -89,6 +89,11 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
struct save_callbacks* callbacks, int hvm,
unsigned long vm_generationid_addr);
+/* Domain Save v2 */
+int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
+ uint32_t max_factor, uint32_t flags,
+ struct save_callbacks* callbacks, int hvm,
+ unsigned long vm_generationid_addr);
/* callbacks provided by xc_domain_restore */
struct restore_callbacks {
@@ -128,6 +133,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
int no_incr_generationid, int checkpointed_stream,
unsigned long *vm_generationid_addr,
struct restore_callbacks *callbacks);
+
+/* Domain Restore v2 */
+int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
+ unsigned int store_evtchn, unsigned long *store_mfn,
+ domid_t store_domid, unsigned int console_evtchn,
+ unsigned long *console_mfn, domid_t console_domid,
+ unsigned int hvm, unsigned int pae, int superpages,
+ int checkpointed_stream,
+ struct restore_callbacks *callbacks);
/**
* xc_domain_restore writes a file to disk that contains the device
* model saved state.
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/6] tools/libxc: Stream specification and some common code
2014-04-09 18:28 [PATCH 0/6] [VERY RFC] Migration Stream v2 Andrew Cooper
2014-04-09 18:28 ` [PATCH 1/6] [HACK] tools/libxc: save/restore v2 framework Andrew Cooper
@ 2014-04-09 18:28 ` Andrew Cooper
2014-04-09 18:28 ` [PATCH 3/6] tools/libxc: Scripts for inspection/valdiation of legacy and new streams Andrew Cooper
` (5 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-09 18:28 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
tools/libxc/saverestore/common.c | 59 ++++++++++++++
tools/libxc/saverestore/common.h | 8 ++
tools/libxc/saverestore/stream_format.h | 130 +++++++++++++++++++++++++++++++
3 files changed, 197 insertions(+)
create mode 100644 tools/libxc/saverestore/common.c
create mode 100644 tools/libxc/saverestore/stream_format.h
diff --git a/tools/libxc/saverestore/common.c b/tools/libxc/saverestore/common.c
new file mode 100644
index 0000000..d2dfd5a
--- /dev/null
+++ b/tools/libxc/saverestore/common.c
@@ -0,0 +1,59 @@
+#include "common.h"
+
+static const char *dhdr_types[] =
+{
+ [DHDR_TYPE_x86_pv] = "x86 PV",
+ [DHDR_TYPE_x86_hvm] = "x86 HVM",
+ [DHDR_TYPE_x86_pvh] = "x86 PVH",
+ [DHDR_TYPE_arm] = "ARM",
+};
+
+const char *dhdr_type_to_str(uint32_t type)
+{
+ if ( type < ARRAY_SIZE(dhdr_types) && dhdr_types[type] )
+ return dhdr_types[type];
+
+ return "Reserved";
+}
+
+static const char *mandatory_rec_types[] =
+{
+ [REC_TYPE_end] = "End",
+ [REC_TYPE_page_data] = "Page data",
+ [REC_TYPE_x86_pv_info] = "x86 PV info",
+ [REC_TYPE_x86_pv_p2m_frames] = "x86 PV P2M frames",
+ [REC_TYPE_x86_pv_vcpu_basic] = "x86 PV vcpu basic",
+ [REC_TYPE_x86_pv_vcpu_extended] = "x86 PV vcpu extended",
+ [REC_TYPE_x86_pv_vcpu_xsave] = "x86 PV vcpu xsave",
+ [REC_TYPE_x86_pv_shared_info] = "x86 PV shared info",
+ [REC_TYPE_tsc_info] = "TSC info",
+};
+
+/*
+static const char *optional_rec_types[] =
+{
+};
+*/
+
+const char *rec_type_to_str(uint32_t type)
+{
+ if ( type & REC_TYPE_optional )
+ return "Reserved";
+
+ if ( ((type & REC_TYPE_optional) == 0 ) &&
+ (type < ARRAY_SIZE(mandatory_rec_types)) &&
+ (mandatory_rec_types[type]) )
+ return mandatory_rec_types[type];
+
+ return "Reserved";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index f1aff44..fff0a39 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -3,6 +3,14 @@
#include "../xg_private.h"
+#include "stream_format.h"
+
+// TODO: Find a better place to put this...
+#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
+
+const char *dhdr_type_to_str(uint32_t type);
+const char *rec_type_to_str(uint32_t type);
+
#endif
/*
* Local variables:
diff --git a/tools/libxc/saverestore/stream_format.h b/tools/libxc/saverestore/stream_format.h
new file mode 100644
index 0000000..6898c3c
--- /dev/null
+++ b/tools/libxc/saverestore/stream_format.h
@@ -0,0 +1,130 @@
+#ifndef __STREAM_FORMAT__H
+#define __STREAM_FORMAT__H
+
+#include <inttypes.h>
+
+/*
+ * Image Header
+ */
+struct ihdr
+{
+ uint64_t marker;
+ uint32_t id;
+ uint32_t version;
+ uint16_t options;
+ uint16_t _res1;
+ uint32_t _res2;
+};
+
+#define IHDR_MARKER 0xffffffffffffffffULL
+#define IHDR_ID 0x58454E46U
+#define IHDR_VERSION 1
+
+#define _IHDR_OPT_ENDIAN 0
+#define IHDR_OPT_LITTLE_ENDIAN (0 << _IHDR_OPT_ENDIAN)
+#define IHDR_OPT_BIG_ENDIAN (1 << _IHDR_OPT_ENDIAN)
+
+/*
+ * Domain Header
+ */
+struct dhdr
+{
+ uint32_t type;
+ uint16_t page_shift;
+ uint16_t _res1;
+ uint32_t xen_major;
+ uint32_t xen_minor;
+};
+
+#define DHDR_TYPE_x86_pv 0x00000001U
+#define DHDR_TYPE_x86_hvm 0x00000002U
+#define DHDR_TYPE_x86_pvh 0x00000003U
+#define DHDR_TYPE_arm 0x00000004U
+
+/*
+ * Record Header
+ */
+struct rhdr
+{
+ uint32_t type;
+ uint32_t length;
+};
+
+/* Somewhat arbitrary - 8MB */
+#define REC_LENGTH_MAX (8U << 20)
+
+#define REC_TYPE_end 0x00000000U
+#define REC_TYPE_page_data 0x00000001U
+#define REC_TYPE_x86_pv_info 0x00000002U
+#define REC_TYPE_x86_pv_p2m_frames 0x00000003U
+#define REC_TYPE_x86_pv_vcpu_basic 0x00000004U
+#define REC_TYPE_x86_pv_vcpu_extended 0x00000005U
+#define REC_TYPE_x86_pv_vcpu_xsave 0x00000006U
+#define REC_TYPE_x86_pv_shared_info 0x00000007U
+#define REC_TYPE_tsc_info 0x00000008U
+
+#define REC_TYPE_optional 0x80000000U
+
+/* PAGE_DATA */
+struct rec_page_data_header
+{
+ uint32_t count;
+ uint32_t _res1;
+ uint64_t pfn[0];
+};
+
+#define PAGE_DATA_PFN_MASK 0x000fffffffffffffULL
+#define PAGE_DATA_TYPE_MASK 0xf000000000000000ULL
+
+/* X86_PV_INFO */
+struct rec_x86_pv_info
+{
+ uint8_t guest_width;
+ uint8_t pt_levels;
+ uint8_t options;
+};
+
+/* X86_PV_P2M_FRAMES */
+struct rec_x86_pv_p2m_frames
+{
+ uint32_t start_pfn;
+ uint32_t end_pfn;
+ uint64_t p2m_pfns[0];
+};
+
+/* VCPU_CONTEXT_{basic,extended} */
+struct rec_x86_pv_vcpu
+{
+ uint32_t vcpu_id;
+ uint32_t _res1;
+ uint8_t context[0];
+};
+
+/* VCPU_CONTEXT_XSAVE */
+struct rec_x86_pv_vcpu_xsave
+{
+ uint32_t vcpu_id;
+ uint32_t _res1;
+ uint64_t xfeature_mask;
+ uint8_t context[0];
+};
+
+/* TSC_INFO */
+struct rec_tsc_info
+{
+ uint32_t mode;
+ uint32_t khz;
+ uint64_t nsec;
+ uint32_t incarnation;
+};
+
+#endif
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/6] tools/libxc: Scripts for inspection/valdiation of legacy and new streams
2014-04-09 18:28 [PATCH 0/6] [VERY RFC] Migration Stream v2 Andrew Cooper
2014-04-09 18:28 ` [PATCH 1/6] [HACK] tools/libxc: save/restore v2 framework Andrew Cooper
2014-04-09 18:28 ` [PATCH 2/6] tools/libxc: Stream specification and some common code Andrew Cooper
@ 2014-04-09 18:28 ` Andrew Cooper
2014-04-09 18:28 ` [PATCH 4/6] tools/libxc: x86 pv common code Andrew Cooper
` (4 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-09 18:28 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Frediano Ziglio
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libxc/saverestore/scripts/generate.py | 59 ++++
.../libxc/saverestore/scripts/inspect-legacy32.py | 167 ++++++++++
tools/libxc/saverestore/scripts/streamspec.py | 89 ++++++
tools/libxc/saverestore/scripts/verify.py | 330 ++++++++++++++++++++
4 files changed, 645 insertions(+)
create mode 100755 tools/libxc/saverestore/scripts/generate.py
create mode 100755 tools/libxc/saverestore/scripts/inspect-legacy32.py
create mode 100644 tools/libxc/saverestore/scripts/streamspec.py
create mode 100755 tools/libxc/saverestore/scripts/verify.py
diff --git a/tools/libxc/saverestore/scripts/generate.py b/tools/libxc/saverestore/scripts/generate.py
new file mode 100755
index 0000000..3b01e65
--- /dev/null
+++ b/tools/libxc/saverestore/scripts/generate.py
@@ -0,0 +1,59 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+from streamspec import *
+import struct, sys
+
+ihdr = struct.pack(IHDR_FORMAT,
+ 0xffffffffffffffff, # Marker
+ IHDR_IDENT, # "XENF" in ASCII
+ 1, # Version
+ IHDR_OPT_LE, # Options
+ 0, 0 # Reserved
+ )
+
+def emit_record(type, data):
+ length = len(data)
+
+ r = struct.pack(RH_FORMAT, type, length)
+ r += data
+
+ padding_len = (8 - (length & 7)) & 7
+ r += '\x00' * padding_len
+
+ sys.stdout.write(r)
+
+
+def emit_pv():
+
+ dhdr = struct.pack(DHDR_FORMAT,
+ DHDR_TYPE_x86_pv, # Type
+ 12, # Page size
+ 0, # Reserved
+ 4, # Xen major
+ 5 # Xen minor
+ )
+
+ sys.stdout.write(ihdr)
+ sys.stdout.write(dhdr)
+
+ x86_pv_info = struct.pack(X86_PV_INFO_FORMAT,
+ 8, # Guest width
+ 4, # Guest levels
+ 0 # Options
+ )
+
+ emit_record(REC_TYPE_x86_pv_info, x86_pv_info)
+ emit_record(REC_TYPE_end, "")
+ return 0
+
+
+def main(argv = sys.argv):
+
+ if len(argv) == 0 or argv[0] == "pv":
+ return emit_pv()
+ else:
+ return 1
+
+if __name__ == "__main__":
+ sys.exit(main(sys.argv[1:]))
diff --git a/tools/libxc/saverestore/scripts/inspect-legacy32.py b/tools/libxc/saverestore/scripts/inspect-legacy32.py
new file mode 100755
index 0000000..e20a8f2
--- /dev/null
+++ b/tools/libxc/saverestore/scripts/inspect-legacy32.py
@@ -0,0 +1,167 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+import sys
+import struct
+
+fin = None
+guest_width = 0
+guest_levels = 0
+
+class StreamError(StandardError):
+ pass
+
+def rdexact(n):
+ _ = fin.read(n)
+ if len(_) != n:
+ raise IOError("Stream truncated")
+ #print"> read 0x%x bytes" % (n, )
+ return _
+
+def unpack_exact(fmt):
+ l = struct.calcsize(fmt)
+ return struct.unpack(fmt, rdexact(l))
+
+def read_extended_info():
+ global guest_width, guest_level
+
+ sig, rem_length = struct.unpack("II", rdexact(8))
+
+ if sig != 0xffffffff:
+ raise StreamError("Bad extended info signature 0x%08x" % (sig,))
+ else:
+ print "Endended Info: length 0x%x" % (rem_length,)
+
+ so_far = 0
+ while so_far < rem_length:
+
+ blkid, datasz = struct.unpack("4sI", rdexact(8))
+ so_far += 8
+
+ print " Record type: %s, size 0x%x" % (blkid, datasz)
+
+ # Eww, but this is how it is done :(
+ if blkid == "vcpu":
+ if datasz == 0x1430:
+ guest_width = 8
+ guest_levels = 4
+ print " 64bit domain, 4 levels"
+ else:
+ raise StreamError("Unable to determine guest width/level")
+
+ rdexact(datasz)
+ so_far += datasz
+
+ if so_far != rem_length:
+ raise StreamError("Overshot total Extended Info size. Consumed 0x%x bytes" % (so_far,))
+
+def read_chunks():
+
+ while True:
+
+ chunk_type, = struct.unpack("i", rdexact(4))
+ print "Chunk: type 0x%x" % (chunk_type,)
+
+ if chunk_type == 0:
+ print " End"
+ return
+
+ elif chunk_type > 0:
+ print " Page Batch"
+ pfn_array = rdexact(chunk_type * 4)
+ page_data = rdexact(chunk_type * 4096)
+
+ elif chunk_type == -2:
+ max_id, = unpack_exact("i")
+ bitmap = rdexact(((max_id/64) + 1) * 8)
+ print " Vcpu info: max_id %d" % (max_id, )
+
+ elif chunk_type == -7:
+ mode, nsec, khz, incarn = unpack_exact("IQII")
+ print " TSC_INFO: mode %s, %d ns, %d khz, %d incarn" % ( mode, nsec, khz, incarn)
+
+ elif chunk_type == -9:
+ print " Last Checkpoint"
+
+ elif chunk_type == -12:
+ sz, = unpack_exact("I")
+ data = rdexact(sz)
+ print " Compressed Data: sz 0x%x" % (sz, )
+
+ elif chunk_type == -18:
+ sz, = unpack_exact("I")
+ data = rdexact(sz)
+ print " Toolstack Data: sz 0x%x" % (sz, )
+
+ else:
+ raise StreamError("Unrecognised chunk")
+
+def main(argv = sys.argv):
+ global fin
+
+ if len(argv) == 2:
+ fin = open(argv[1], "rb")
+ else:
+ fin = sys.stdin
+
+ try:
+ # Skip Xl header
+ if "Xen saved domain, xl format\n \0 \r" != rdexact(32):
+ raise StreamError("No xl header")
+
+ _, _, _, optlen = struct.unpack("=IIII", rdexact(16))
+ rdexact(optlen)
+ print "xl header skipped"
+
+ # P2M size
+ p2m_size, = struct.unpack("I", rdexact(4))
+ print "P2M Size: 0x%x" % (p2m_size,)
+
+ # Extended info
+ read_extended_info()
+
+ # P2M list
+
+ fpp = 4096/guest_width
+ p2m_len = (p2m_size + fpp - 1) / fpp
+
+ print "Reading p2m frames. fpp: %d, p2m_len: %d" % (fpp, p2m_len)
+
+ p2m_frames = rdexact(p2m_len * 4)
+ if p2m_len < 20:
+ print list(struct.unpack("I" * p2m_len, p2m_frames))
+
+ read_chunks()
+
+ unmapped_pfn_count, = unpack_exact("I")
+ unmapped_pfn_list = rdexact(unmapped_pfn_count * 4)
+ print "Unmapped PFN count: 0x%x" % (unmapped_pfn_count, )
+
+ # VCPU Context fudge
+ _ = rdexact(0x1430)
+ _ = rdexact(128)
+ xfeature_mask, xsize = unpack_exact("QQ")
+ _ = rdexact(xsize)
+ print "Got VCPU information"
+
+ shared_info = rdexact(4096)
+ print "Got shinfo"
+
+ if fin.read(1) != "":
+ raise StreamError("Junk found on the end of the stream")
+
+ except (IOError, StreamError, ) as e:
+ print "Error: ", e
+ return 1
+
+ except RuntimeError as e:
+ print "Script error", e
+ print "Please fix me"
+ return 2
+
+ print "Done"
+ return 0
+
+
+if __name__ == "__main__":
+ sys.exit(main(sys.argv))
diff --git a/tools/libxc/saverestore/scripts/streamspec.py b/tools/libxc/saverestore/scripts/streamspec.py
new file mode 100644
index 0000000..12b351e
--- /dev/null
+++ b/tools/libxc/saverestore/scripts/streamspec.py
@@ -0,0 +1,89 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+# Image Header
+IHDR_FORMAT = "!QIIHHI"
+
+IHDR_IDENT = 0x58454E46 # "XENF" in ASCII
+
+IHDR_OPT_ENDIAN_ = 0
+IHDR_OPT_LE = (0 << IHDR_OPT_ENDIAN_)
+IHDR_OPT_BE = (1 << IHDR_OPT_ENDIAN_)
+
+IHDR_OPT_RESZ_MASK = 0xfffe
+
+# Domain Header
+DHDR_FORMAT = "=IHHII"
+
+DHDR_TYPE_x86_pv = 0x00000001
+DHDR_TYPE_x86_hvm = 0x00000002
+DHDR_TYPE_x86_pvh = 0x00000003
+DHDR_TYPE_arm = 0x00000004
+
+dhdr_type_to_str = {
+ DHDR_TYPE_x86_pv : "x86 PV",
+ DHDR_TYPE_x86_hvm : "x86 HVM",
+ DHDR_TYPE_x86_pvh : "x86 PVH",
+ DHDR_TYPE_arm : "ARM",
+}
+
+RH_FORMAT = "=II"
+
+REC_TYPE_end = 0x00000000
+REC_TYPE_page_data = 0x00000001
+REC_TYPE_x86_pv_info = 0x00000002
+REC_TYPE_x86_pv_p2m_frames = 0x00000003
+REC_TYPE_x86_pv_vcpu_basic = 0x00000004
+REC_TYPE_x86_pv_vcpu_extended = 0x00000005
+REC_TYPE_x86_pv_vcpu_xsave = 0x00000006
+REC_TYPE_x86_pv_shared_info = 0x00000007
+REC_TYPE_tsc_info = 0x00000008
+
+rec_type_to_str = {
+ REC_TYPE_end : "End",
+ REC_TYPE_page_data : "Page data",
+ REC_TYPE_x86_pv_info : "x86 PV info",
+ REC_TYPE_x86_pv_p2m_frames : "x86 PV P2M frames",
+ REC_TYPE_x86_pv_vcpu_basic : "x86 PV vcpu basic",
+ REC_TYPE_x86_pv_vcpu_extended : "x86 PV vcpu extended",
+ REC_TYPE_x86_pv_vcpu_xsave : "x86 PV vcpu xsave",
+ REC_TYPE_x86_pv_shared_info : "x86 PV shared info",
+ REC_TYPE_tsc_info : "TSC info",
+}
+
+# page_data
+PAGE_DATA_FORMAT = "=II"
+PAGE_DATA_PFN_MASK = (1L << 52) - 1
+PAGE_DATA_PFN_RESZ_MASK = ((1L << 60) - 1) & ~((1L << 52) - 1)
+
+# flags from xen/public/domctl.h: XEN_DOMCTL_PFINFO_* shifted by 32 bits
+PAGE_DATA_TYPE_SHIFT = 60
+PAGE_DATA_TYPE_LTABTYPE_MASK = (0x7L << PAGE_DATA_TYPE_SHIFT)
+PAGE_DATA_TYPE_LTAB_MASK = (0xfL << PAGE_DATA_TYPE_SHIFT)
+PAGE_DATA_TYPE_LPINTAB = (0x8L << PAGE_DATA_TYPE_SHIFT) # Pinned pagetable
+
+PAGE_DATA_TYPE_NOTAB = (0x0L << PAGE_DATA_TYPE_SHIFT) # Regular page
+PAGE_DATA_TYPE_L1TAB = (0x1L << PAGE_DATA_TYPE_SHIFT) # L1 pagetable
+PAGE_DATA_TYPE_L2TAB = (0x2L << PAGE_DATA_TYPE_SHIFT) # L2 pagetable
+PAGE_DATA_TYPE_L3TAB = (0x3L << PAGE_DATA_TYPE_SHIFT) # L3 pagetable
+PAGE_DATA_TYPE_L4TAB = (0x4L << PAGE_DATA_TYPE_SHIFT) # L4 pagetable
+PAGE_DATA_TYPE_BROKEN = (0xdL << PAGE_DATA_TYPE_SHIFT) # Broken
+PAGE_DATA_TYPE_XALLOC = (0xeL << PAGE_DATA_TYPE_SHIFT) # Allocate-only
+PAGE_DATA_TYPE_XTAB = (0xfL << PAGE_DATA_TYPE_SHIFT) # Invalid
+
+# x86_pv_info
+X86_PV_INFO_FORMAT = "=BBB"
+
+X86_PV_INFO_OPT_VMASST_ = 0
+X86_PV_INFO_OPT_VMASST = (1 << X86_PV_INFO_OPT_VMASST_)
+
+X86_PV_INFO_OPT_RESZ_MASK = 0xfe
+
+# x86_pv_vcpu_{basic,extended}
+X86_PV_VCPU_FORMAT = "=II"
+
+# x86_pv_vcpu_xsave
+X86_PV_VCPU_XSAVE_FORMAT = "=IIQ"
+
+# tsc_info
+TSC_INFO_FORMAT = "=IIQI"
diff --git a/tools/libxc/saverestore/scripts/verify.py b/tools/libxc/saverestore/scripts/verify.py
new file mode 100755
index 0000000..0b1ec14
--- /dev/null
+++ b/tools/libxc/saverestore/scripts/verify.py
@@ -0,0 +1,330 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+import sys
+import struct
+
+from streamspec import *
+
+class StreamError(StandardError):
+ pass
+
+class RecordError(StandardError):
+ pass
+
+def skip_xl_header(stream):
+
+ magic = stream.read(8)
+ if magic != "mat\n \0 \r":
+ return False
+
+ header = stream.read(16)
+ if len(header) != 16:
+ return False
+
+ _, _, _, optlen = struct.unpack("=IIII", header)
+
+ optdata = stream.read(optlen)
+ if len(optdata) != optlen:
+ return False
+
+ return True
+
+
+def verify_ihdr(stream):
+ """ Verify an image header """
+
+ datasz = struct.calcsize(IHDR_FORMAT)
+ data = stream.read(datasz)
+
+ # xl header record?
+ if data == "Xen saved domain, xl for":
+ if skip_xl_header(stream):
+ data = stream.read(datasz)
+ else:
+ raise StreamError("Invalid looking xl header on the stream")
+
+ if len(data) != datasz:
+ raise IOError("Truncated stream")
+
+ marker, id, version, options, res1, res2 = struct.unpack(IHDR_FORMAT, data)
+
+ if marker != 0xffffffffffffffff:
+ raise StreamError("Bad image marker: Expected 0xffffffffffffffff, "
+ "got 0x%x" % (marker, ))
+
+ if id != 0x58454e46:
+ raise StreamError("Bad image id: Expected 0x0x58454e46, got 0x%x"
+ % (id, ))
+
+ if version != 1:
+ raise StreamError("Unknown image version: Expected 1, got %d"
+ % (version, ))
+
+ if options & IHDR_OPT_RESZ_MASK:
+ raise StreamError("Reserved bits set in image options field: 0x%x"
+ % (options & IHDR_OPT_RESZ_MASK))
+
+ if res1 != 0 or res2 != 0:
+ raise StreamError("Reserved bits set in image header: 0x%04x:0x%08x"
+ % (res1, res2))
+
+ if ( sys.byteorder == "little" and
+ (options & IHDR_OPT_ENDIAN_) != IHDR_OPT_LE ):
+ raise StreamError("Stream is not native endianess - unable to validate")
+
+ print "Valid Image Header:",
+ if options & IHDR_OPT_BE:
+ print "big endian"
+ else:
+ print "little endian"
+
+def verify_dhdr(stream):
+ """ Verify a domain header """
+
+ datasz = struct.calcsize(DHDR_FORMAT)
+ data = stream.read(datasz)
+
+ if len(data) != datasz:
+ raise IOError("Truncated stream")
+
+ type, page_shift, res1, major, minor = struct.unpack(DHDR_FORMAT, data)
+
+ if type not in dhdr_type_to_str:
+ raise StreamError("Unrecognised domain type 0x%x" % (type, ))
+
+ if res1 != 0:
+ raise StreamError("Reserved bits set in domain header 0x%04x"
+ % (res1, ))
+
+ if page_shift != 12:
+ raise StreamError("Page shift expected to be 12. Got %d"
+ % (page_shift, ))
+
+ print "Valid Domain Header: %s from Xen %d.%d (page sz %d)" \
+ % (dhdr_type_to_str[type], major, minor, 2**page_shift)
+
+
+def verify_record_end(content):
+
+ if len(content) != 0:
+ raise RecordError("End record with non-zero length")
+
+def verify_page_data(content):
+ minsz = struct.calcsize(PAGE_DATA_FORMAT)
+
+ if len(content) <= minsz:
+ raise RecordError("PAGE_DATA record must be at least %d bytes long"
+ % (minsz, ))
+
+ count, res1 = struct.unpack_from(PAGE_DATA_FORMAT, content)
+
+ if res1 != 0:
+ raise StreamError("Reserved bits set in PAGE_DATA record 0x%04x"
+ % (res1, ))
+
+ pfnsz = count * 8
+ if (len(content) - minsz) < pfnsz:
+ raise RecordError("PAGE_DATA record must contain a pfn record for "
+ "each count")
+
+ pfns = list(struct.unpack_from("=%dQ" % (count,), content, minsz))
+
+ nr_pages = 0
+ for idx, pfn in enumerate(pfns):
+
+ if pfn & PAGE_DATA_PFN_RESZ_MASK:
+ raise RecordError("Reserved bits set in pfn[%d]: 0x%016x",
+ idx, pfn & PAGE_DATA_PFN_RESZ_MASK)
+
+ if pfn >> PAGE_DATA_TYPE_SHIFT in (5, 6, 7, 8):
+ raise RecordError("Invalid type value in pfn[%d]: 0x%016x",
+ idx, pfn & PAGE_DATA_TYPE_LTAB_MASK)
+
+ # We expect page data for each normal page or pagetable
+ if PAGE_DATA_TYPE_NOTAB <= (pfn & PAGE_DATA_TYPE_LTABTYPE_MASK) <= PAGE_DATA_TYPE_L4TAB:
+ nr_pages += 1
+
+ pagesz = nr_pages * 4096
+ if len(content) != minsz + pfnsz + pagesz:
+ raise RecordError("Wrong size")
+
+
+def verify_record_x86_pv_vcpu_generic(content, name):
+ # Generic for both REC_TYPE_x86_pv_vcpu_{basic,extended}
+ minsz = struct.calcsize(X86_PV_VCPU_FORMAT)
+
+ if len(content) <= minsz:
+ raise RecordError("X86_PV_VCPU_%s record length must be at least %d"
+ " bytes long" % (name, minsz))
+
+ vcpuid, res1 = struct.unpack_from(X86_PV_VCPU_FORMAT, content)
+
+ if res1 != 0:
+ raise StreamError("Reserved bits set in x86_pv_vcpu_%s record 0x%04x"
+ % (name, res1))
+
+ print " vcpu%d %s context, %d bytes" % (vcpuid, name, len(content) - minsz)
+
+def verify_record_x86_pv_vcpu_xsave(content):
+ minsz = struct.calcsize(X86_PV_VCPU_XSAVE_FORMAT)
+
+ if len(content) <= minsz:
+ raise RecordError("X86_PV_VCPU_XSAVE record length must be at least %d"
+ " bytes long" % (minsz, ))
+
+ vcpuid, res1, xmask = struct.unpack_from(X86_PV_VCPU_XSAVE_FORMAT,
+ content)
+
+ if res1 != 0:
+ raise StreamError("Reserved bits set in X86_PV_VCPU_XSAVE record "
+ "0x%04x" % (res1, ))
+
+ print " vcpu%d xsave context, mask 0x%x" % (vcpuid, xmask)
+
+
+def verify_x86_pv_info(content):
+
+ if len(content) != 3:
+ raise RecordError("x86_pf_info: expected length of 3, got %d"
+ % (len(content), ))
+
+ width, levels, options = struct.unpack(X86_PV_INFO_FORMAT, content)
+
+ if width not in (4, 8):
+ raise RecordError("Expected width of 4 or 8, got %d" % (width, ))
+
+ if levels not in (4, 8):
+ raise RecordError("Expected levels of 3 or 4, got %d" % (levels, ))
+
+ if (options & X86_PV_INFO_OPT_RESZ_MASK) != 0:
+ raise StreamError("Reserved bits set in X86_PV_INFO options: 0x%02x"
+ % (options & X86_PV_INFO_OPT_RESZ_MASK, ))
+
+ bitness = {4:32, 8:64}[width]
+
+ print " %sbit guest, %d levels of pagetables" % (bitness, levels)
+
+def verify_x86_pv_p2m_frames(content):
+
+ if len(content) % 8 != 0:
+ raise RecordError("Length expected to be a multiple of 8, not %d"
+ % (len(content), ))
+
+ start, end = struct.unpack_from("=II", content)
+
+ print " Start pfn 0x%x, End 0x%x" % (start, end)
+
+def verify_record_x86_pv_shared_info(content):
+
+ if len(content) != 4096:
+ raise RecordError("Length expected to be 4906 bytes, not %d"
+ % (len(content), ))
+
+def verify_record_tsc_info(content):
+
+ sz = struct.calcsize(TSC_INFO_FORMAT)
+
+ if len(content) != sz:
+ raise RecordError("Length should be %u bytes" % (sz, ))
+
+ mode, khz, nsec, incarn = struct.unpack(TSC_INFO_FORMAT, content)
+ print (" Mode %u, %u kHz, %u ns, incarnation %d"
+ % (mode, khz, nsec, incarn))
+
+record_verifiers = {
+ REC_TYPE_end : verify_record_end,
+ REC_TYPE_page_data : verify_page_data,
+
+ REC_TYPE_x86_pv_info: verify_x86_pv_info,
+ REC_TYPE_x86_pv_p2m_frames: verify_x86_pv_p2m_frames,
+
+ REC_TYPE_x86_pv_vcpu_basic :
+ lambda x: verify_record_x86_pv_vcpu_generic(x, "basic"),
+ REC_TYPE_x86_pv_vcpu_extended :
+ lambda x: verify_record_x86_pv_vcpu_generic(x, "extended"),
+ REC_TYPE_x86_pv_vcpu_xsave : verify_record_x86_pv_vcpu_xsave,
+
+ REC_TYPE_x86_pv_shared_info: verify_record_x86_pv_shared_info,
+ REC_TYPE_tsc_info: verify_record_tsc_info,
+}
+
+_squahsed_data_records = 0
+def verify_record(stream):
+ """ Verify a record """
+ global _squahsed_data_records
+
+ datasz = struct.calcsize(RH_FORMAT)
+ data = stream.read(datasz)
+
+ if len(data) != datasz:
+ raise IOError("Truncated stream")
+
+ type, length = struct.unpack(RH_FORMAT, data)
+
+ if type not in rec_type_to_str:
+ raise StreamError("Unrecognised record type %x" % (type, ))
+
+ contentsz = (length + 7) & ~7
+ content = stream.read(contentsz)
+
+ if len(content) != contentsz:
+ raise IOError("Truncated stream")
+
+ padding = content[length:]
+ if padding != "\x00" * len(padding):
+ raise StreamError("Padding containging non0 bytes found")
+
+ if type != REC_TYPE_page_data:
+
+ if _squahsed_data_records > 0:
+ print ("Squashed %d valid Page Data records together"
+ % (_squahsed_data_records, ))
+ _squahsed_data_records = 0
+
+ print ("Valid Record Header: %s, length %d"
+ % (rec_type_to_str[type], length))
+
+ else:
+ _squahsed_data_records += 1
+
+ if type not in record_verifiers:
+ raise RuntimeError("No verification function")
+ else:
+ record_verifiers[type](content[:length])
+
+ return type
+
+
+def main(argv = sys.argv):
+
+ if len(argv) == 2:
+ fin = open(argv[1], "rb")
+ else:
+ fin = sys.stdin
+
+ try:
+ verify_ihdr(fin)
+ verify_dhdr(fin)
+
+ while verify_record(fin) != REC_TYPE_end:
+ pass
+
+ if fin.read(1) != "":
+ raise StreamError("Junk found on the end of the stream")
+
+ except (IOError, StreamError, RecordError) as e:
+ print "Error: ", e
+ return 1
+
+ except RuntimeError as e:
+ print "Script error", e
+ print "Please fix me"
+ return 2
+
+ print "Done"
+ return 0
+
+
+if __name__ == "__main__":
+ sys.exit(main(sys.argv))
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/6] tools/libxc: x86 pv common code
2014-04-09 18:28 [PATCH 0/6] [VERY RFC] Migration Stream v2 Andrew Cooper
` (2 preceding siblings ...)
2014-04-09 18:28 ` [PATCH 3/6] tools/libxc: Scripts for inspection/valdiation of legacy and new streams Andrew Cooper
@ 2014-04-09 18:28 ` Andrew Cooper
2014-04-09 18:28 ` [PATCH 5/6] tools/libxc: x86 pv save implementation Andrew Cooper
` (3 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-09 18:28 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Frediano Ziglio
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libxc/saverestore/common.h | 79 ++++++++++++
tools/libxc/saverestore/common_x86_pv.c | 208 +++++++++++++++++++++++++++++++
tools/libxc/saverestore/common_x86_pv.h | 105 ++++++++++++++++
3 files changed, 392 insertions(+)
create mode 100644 tools/libxc/saverestore/common_x86_pv.c
create mode 100644 tools/libxc/saverestore/common_x86_pv.h
diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index fff0a39..4220c18 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -1,8 +1,16 @@
#ifndef __COMMON__H
#define __COMMON__H
+// Hack out junk from the namespace
+#define mfn_to_pfn __UNUSED_mfn_to_pfn
+#define pfn_to_mfn __UNUSED_pfn_to_mfn
+
#include "../xg_private.h"
+#undef mfn_to_pfn
+#undef pfn_to_mfn
+
+
#include "stream_format.h"
// TODO: Find a better place to put this...
@@ -11,6 +19,77 @@
const char *dhdr_type_to_str(uint32_t type);
const char *rec_type_to_str(uint32_t type);
+struct context
+{
+ xc_interface *xch;
+ uint32_t domid;
+ int fd;
+
+ xc_dominfo_t dominfo;
+
+ union
+ {
+ struct
+ {
+ /* From Image Header */
+ uint32_t format_version;
+
+ /* From Domain Header */
+ uint32_t guest_type;
+ uint32_t guest_page_size;
+
+ unsigned long xenstore_mfn, console_mfn;
+ unsigned int xenstore_evtchn, console_evtchn;
+ domid_t xenstore_domid, console_domid;
+ } restore;
+
+ struct
+ {
+ struct save_callbacks *callbacks;
+ } save;
+ };
+
+ union
+ {
+ struct
+ {
+ /* 4 or 8; 32 or 64 bit domain */
+ unsigned int width;
+ /* 3 or 4 pagetable levels */
+ unsigned int levels;
+
+
+ /* Maximum Xen frame */
+ unsigned long max_mfn;
+ /* Read-only machine to phys map */
+ xen_pfn_t *m2p;
+ /* firtst m2p mfn. Incorrectly based on toolstack bitness - FIXME */
+ xen_pfn_t m2p_mfn0;
+ /* Number of m2p frames mapped */
+ unsigned long nr_m2p_frames;
+
+
+ /* Maximum guest frame */
+ unsigned long max_pfn;
+ /* Frames per page in guest p2m */
+ unsigned int fpp;
+
+ /* Number of frames making up the p2m */
+ unsigned int p2m_frames;
+ /* Guest's phys to machine map. Mapped read-only (save) or
+ * allocated locally (restore). Uses guest unsigned longs. */
+ void *p2m;
+ /* The guest pfns containing the p2m leaves */
+ xen_pfn_t *p2m_pfns;
+ /* Types for each page */
+ uint32_t *pfn_types;
+
+ /* Read-only mapping of guests shared info page */
+ shared_info_any_t *shinfo;
+ } x86_pv;
+ };
+};
+
#endif
/*
* Local variables:
diff --git a/tools/libxc/saverestore/common_x86_pv.c b/tools/libxc/saverestore/common_x86_pv.c
new file mode 100644
index 0000000..71fb13d
--- /dev/null
+++ b/tools/libxc/saverestore/common_x86_pv.c
@@ -0,0 +1,208 @@
+#include <assert.h>
+
+#include "common_x86_pv.h"
+
+xen_pfn_t mfn_to_pfn(struct context *ctx, xen_pfn_t mfn)
+{
+ assert(mfn <= ctx->x86_pv.max_mfn);
+ return ctx->x86_pv.m2p[mfn];
+}
+
+xen_pfn_t pfn_to_mfn(struct context *ctx, xen_pfn_t pfn)
+{
+ assert(pfn <= ctx->x86_pv.max_pfn);
+
+ if ( ctx->x86_pv.width == sizeof (uint64_t) )
+ /* 64 bit guest. Need to truncate their pfns for 32 bit toolstacks */
+ return ((uint64_t *)ctx->x86_pv.p2m)[pfn];
+ else
+ {
+ /* 32 bit guest. Need to expand INVALID_MFN fot 64 bit toolstacks */
+ uint32_t mfn = ((uint32_t *)ctx->x86_pv.p2m)[pfn];
+
+ return mfn == ~0U ? INVALID_MFN : mfn;
+ }
+}
+
+void set_p2m(struct context *ctx, xen_pfn_t pfn, xen_pfn_t mfn)
+{
+ assert(pfn <= ctx->x86_pv.max_pfn);
+
+ if ( ctx->x86_pv.width == sizeof (uint64_t) )
+ /* 64 bit guest. Need to expand INVALID_MFN for 32 bit toolstacks */
+ ((uint64_t *)ctx->x86_pv.p2m)[pfn] = mfn == INVALID_MFN ? ~0ULL : mfn;
+ else
+ /* 32 bit guest. Can safely truncate INVALID_MFN fot 64 bit toolstacks */
+ ((uint32_t *)ctx->x86_pv.p2m)[pfn] = mfn;
+}
+
+bool mfn_in_pseudophysmap(struct context *ctx, xen_pfn_t mfn)
+{
+ return ( (mfn <= ctx->x86_pv.max_mfn) &&
+ (mfn_to_pfn(ctx, mfn) <= ctx->x86_pv.max_pfn) &&
+ (pfn_to_mfn(ctx, mfn_to_pfn(ctx, mfn) == mfn)) );
+}
+
+void pseudophysmap_walk(struct context *ctx, xen_pfn_t mfn)
+{
+ xc_interface *xch = ctx->xch;
+ xen_pfn_t pfn = ~0UL;
+
+ ERROR("mfn %#lx, max %#lx", mfn, ctx->x86_pv.max_mfn);
+
+ if ( (mfn != ~0UL) && (mfn <= ctx->x86_pv.max_mfn) )
+ {
+ pfn = ctx->x86_pv.m2p[mfn];
+ ERROR(" m2p[%#lx] = %#lx, max_pfn %#lx",
+ mfn, pfn, ctx->x86_pv.max_pfn);
+ }
+
+ if ( (pfn != ~0UL) && (pfn <= ctx->x86_pv.max_pfn) )
+ ERROR(" p2m[%#lx] = %#lx",
+ pfn, pfn_to_mfn(ctx, pfn));
+}
+
+xen_pfn_t cr3_to_mfn(struct context *ctx, uint64_t cr3)
+{
+ if ( ctx->x86_pv.width == 8 )
+ return cr3 >> 12;
+ else
+ return (((uint32_t)cr3 >> 12) | ((uint32_t)cr3 << 20));
+}
+
+uint64_t mfn_to_cr3(struct context *ctx, xen_pfn_t mfn)
+{
+ if ( ctx->x86_pv.width == 8 )
+ return ((uint64_t)mfn) << 12;
+ else
+ return (((uint32_t)mfn << 12) | ((uint32_t)mfn >> 20));
+}
+
+int x86_pv_domain_info(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ unsigned int guest_width, guest_levels, fpp;
+ int max_pfn;
+
+ /* Get the domain width */
+ if ( xc_domain_get_guest_width(xch, ctx->domid, &guest_width) )
+ {
+ PERROR("Unable to determine dom%d's width", ctx->domid);
+ return -1;
+ }
+ else if ( guest_width == 4 )
+ guest_levels = 3;
+ else if ( guest_width == 8 )
+ guest_levels = 4;
+ else
+ {
+ ERROR("Invalid guest width %d. Expected 32 or 64", guest_width);
+ return -1;
+ }
+ ctx->x86_pv.width = guest_width;
+ ctx->x86_pv.levels = guest_levels;
+ ctx->x86_pv.fpp = fpp = PAGE_SIZE / ctx->x86_pv.width;
+
+ DPRINTF("%d bits, %d levels", guest_width * 8, guest_levels);
+
+ /* Get the domains maximum pfn */
+ max_pfn = xc_domain_maximum_gpfn(xch, ctx->domid);
+ if ( max_pfn < 0 )
+ {
+ PERROR("Unable to obtain guests max pfn");
+ return -1;
+ }
+ else if ( max_pfn >= ~XEN_DOMCTL_PFINFO_LTAB_MASK )
+ {
+ errno = E2BIG;
+ PERROR("Cannot save a guest this large %#x");
+ return -1;
+ }
+ else if ( max_pfn > 0 )
+ {
+ ctx->x86_pv.max_pfn = max_pfn;
+ ctx->x86_pv.p2m_frames = (ctx->x86_pv.max_pfn + fpp) / fpp;
+
+ DPRINTF("max_pfn %#x, p2m_frames %d", max_pfn, ctx->x86_pv.p2m_frames);
+ }
+
+ return 0;
+}
+
+int x86_pv_map_m2p(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ long max_page = xc_maximum_ram_page(xch);
+ unsigned long m2p_chunks, m2p_size;
+ privcmd_mmap_entry_t *entries = NULL;
+ xen_pfn_t *extents_start = NULL;
+ int rc = -1, i;
+
+ if ( max_page < 0 )
+ {
+ PERROR("Failed to get maximum ram page");
+ goto err;
+ }
+
+ ctx->x86_pv.max_mfn = max_page;
+ m2p_size = M2P_SIZE(ctx->x86_pv.max_mfn);
+ m2p_chunks = M2P_CHUNKS(ctx->x86_pv.max_mfn);
+
+ extents_start = malloc(m2p_chunks * sizeof(xen_pfn_t));
+ if ( !extents_start )
+ {
+ ERROR("Unable to allocate %zu bytes for m2p mfns",
+ m2p_chunks * sizeof(xen_pfn_t));
+ goto err;
+ }
+
+ if ( xc_machphys_mfn_list(xch, m2p_chunks, extents_start) )
+ {
+ PERROR("Failed to get m2p mfn list");
+ goto err;
+ }
+
+ entries = malloc(m2p_chunks * sizeof(privcmd_mmap_entry_t));
+ if ( !entries )
+ {
+ ERROR("Unable to allocate %zu bytes for m2p mapping mfns",
+ m2p_chunks * sizeof(privcmd_mmap_entry_t));
+ goto err;
+ }
+
+ for ( i = 0; i < m2p_chunks; ++i )
+ entries[i].mfn = extents_start[i];
+
+ ctx->x86_pv.m2p = xc_map_foreign_ranges(
+ xch, DOMID_XEN, m2p_size, PROT_READ,
+ M2P_CHUNK_SIZE, entries, m2p_chunks);
+
+ if ( !ctx->x86_pv.m2p )
+ {
+ PERROR("Failed to mmap m2p ranges");
+ goto err;
+ }
+
+ ctx->x86_pv.nr_m2p_frames = (M2P_CHUNK_SIZE >> PAGE_SHIFT) * m2p_chunks;
+ ctx->x86_pv.m2p_mfn0 = entries[0].mfn;
+
+ /* All Done */
+ rc = 0;
+ DPRINTF("max_mfn %#lx", ctx->x86_pv.max_mfn);
+
+err:
+ free(entries);
+ free(extents_start);
+
+ return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/saverestore/common_x86_pv.h b/tools/libxc/saverestore/common_x86_pv.h
new file mode 100644
index 0000000..ba1b60f
--- /dev/null
+++ b/tools/libxc/saverestore/common_x86_pv.h
@@ -0,0 +1,105 @@
+#ifndef __COMMON_X86_PV_H
+#define __COMMON_X86_PV_H
+
+#include <stdbool.h>
+#include "common.h"
+
+/*
+ * Convert an mfn to a pfn, given Xens m2p table.
+ *
+ * Caller must ensure that the requested mfn is in range.
+ */
+xen_pfn_t mfn_to_pfn(struct context *ctx, xen_pfn_t mfn);
+
+/*
+ * Convert a pfn to an mfn, given the guests p2m table.
+ *
+ * Caller must ensure that the requested pfn is in range.
+ */
+xen_pfn_t pfn_to_mfn(struct context *ctx, xen_pfn_t pfn);
+
+/*
+ * Set a mapping in the p2m table.
+ *
+ * Caller must ensure that the requested pfn is in range.
+ */
+void set_p2m(struct context *ctx, xen_pfn_t pfn, xen_pfn_t mfn);
+
+/*
+ * Query whether a particular mfn is valid in the physmap of a guest.
+ */
+bool mfn_in_pseudophysmap(struct context *ctx, xen_pfn_t mfn);
+
+/*
+ * Debug a particular mfn by walking the p2m and m2p.
+ */
+void pseudophysmap_walk(struct context *ctx, xen_pfn_t mfn);
+
+/*
+ * Convert a PV cr3 field to an mfn.
+ */
+xen_pfn_t cr3_to_mfn(struct context *ctx, uint64_t cr3);
+
+/*
+ * Convert an mfn to a PV cr3 field.
+ */
+uint64_t mfn_to_cr3(struct context *ctx, xen_pfn_t mfn);
+
+/*
+ * Extract an MFN from a Pagetable Entry.
+ */
+static inline xen_pfn_t pte_to_frame(struct context *ctx, uint64_t pte)
+{
+ if ( ctx->x86_pv.width == 8 )
+ return (pte >> PAGE_SHIFT) & ((1ULL << (52 - PAGE_SHIFT)) - 1);
+ else
+ return (pte >> PAGE_SHIFT) & ((1ULL << (44 - PAGE_SHIFT)) - 1);
+}
+
+static inline void update_pte(struct context *ctx, uint64_t *pte, xen_pfn_t pfn)
+{
+ if ( ctx->x86_pv.width == 8 )
+ *pte &= ~(((1ULL << (52 - PAGE_SHIFT)) - 1) << PAGE_SHIFT);
+ else
+ *pte &= ~(((1ULL << (44 - PAGE_SHIFT)) - 1) << PAGE_SHIFT);
+
+ *pte |= (uint64_t)pfn << PAGE_SHIFT;
+}
+
+/*
+ * Get current domain information.
+ *
+ * Fills ctx->x86_pv
+ * - .width
+ * - .levels
+ * - .fpp
+ * - .p2m_frames
+ *
+ * Used by the save side to create the X86_PV_INFO record, and by the restore
+ * side to verify the incoming stream.
+ *
+ * Returns 0 on success and non-zero on error.
+ */
+int x86_pv_domain_info(struct context *ctx);
+
+/*
+ * Maps the Xen M2P.
+ *
+ * Fills ctx->x86_pv.
+ * - .max_mfn
+ * - .m2p
+ *
+ * Returns 0 on success and non-zero on error.
+ */
+int x86_pv_map_m2p(struct context *ctx);
+
+#endif
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 5/6] tools/libxc: x86 pv save implementation
2014-04-09 18:28 [PATCH 0/6] [VERY RFC] Migration Stream v2 Andrew Cooper
` (3 preceding siblings ...)
2014-04-09 18:28 ` [PATCH 4/6] tools/libxc: x86 pv common code Andrew Cooper
@ 2014-04-09 18:28 ` Andrew Cooper
2014-04-09 18:28 ` [PATCH 6/6] tools/libxc: x86 pv restore implementation Andrew Cooper
` (2 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-09 18:28 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Frediano Ziglio
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libxc/saverestore/common.c | 36 ++
tools/libxc/saverestore/common.h | 53 +++
tools/libxc/saverestore/save.c | 33 +-
tools/libxc/saverestore/save_x86_pv.c | 843 +++++++++++++++++++++++++++++++++
4 files changed, 964 insertions(+), 1 deletion(-)
create mode 100644 tools/libxc/saverestore/save_x86_pv.c
diff --git a/tools/libxc/saverestore/common.c b/tools/libxc/saverestore/common.c
index d2dfd5a..df18447 100644
--- a/tools/libxc/saverestore/common.c
+++ b/tools/libxc/saverestore/common.c
@@ -1,3 +1,5 @@
+#include <assert.h>
+
#include "common.h"
static const char *dhdr_types[] =
@@ -48,6 +50,40 @@ const char *rec_type_to_str(uint32_t type)
return "Reserved";
}
+int write_split_record(struct context *ctx, struct record *rec,
+ void *buf, size_t sz)
+{
+ static const char zeroes[7] = { 0 };
+ xc_interface *xch = ctx->xch;
+ uint32_t combined_length = rec->length + sz;
+ size_t record_length = (combined_length + 7) & ~7UL;
+
+ if ( record_length > REC_LENGTH_MAX )
+ {
+ ERROR("Record (0x%08"PRIx32", %s) length 0x%"PRIx32
+ " exceeds max (0x%"PRIx32")", rec->type,
+ rec_type_to_str(rec->type), rec->length, REC_LENGTH_MAX);
+ return -1;
+ }
+
+ if ( rec->length )
+ assert(rec->data);
+ if ( sz )
+ assert(buf);
+
+ if ( write_exact(ctx->fd, &rec->type, sizeof rec->type) ||
+ write_exact(ctx->fd, &combined_length, sizeof rec->length) ||
+ (rec->length && write_exact(ctx->fd, rec->data, rec->length)) ||
+ (sz && write_exact(ctx->fd, buf, sz)) ||
+ write_exact(ctx->fd, zeroes, record_length - combined_length) )
+ {
+ PERROR("Unable to write record to stream");
+ return -1;
+ }
+
+ return 0;
+}
+
/*
* Local variables:
* mode: C
diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index 4220c18..a2c8cee 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -6,7 +6,10 @@
#define pfn_to_mfn __UNUSED_pfn_to_mfn
#include "../xg_private.h"
+#include "../xg_save_restore.h"
+#undef GET_FIELD
+#undef SET_FIELD
#undef mfn_to_pfn
#undef pfn_to_mfn
@@ -90,6 +93,56 @@ struct context
};
};
+/* Saves an x86 PV domain. */
+int save_x86_pv(struct context *ctx);
+
+struct record
+{
+ uint32_t type;
+ uint32_t length;
+ void *data;
+};
+
+/* Gets a field from an *_any union */
+#define GET_FIELD(_c, _p, _f) \
+ ({ (_c)->x86_pv.width == 8 ? \
+ (_p)->x64._f: \
+ (_p)->x32._f; \
+ }) \
+
+/* Gets a field from an *_any union */
+#define SET_FIELD(_c, _p, _f, _v) \
+ ({ if ( (_c)->x86_pv.width == 8 ) \
+ (_p)->x64._f = (_v); \
+ else \
+ (_p)->x32._f = (_v); \
+ })
+
+/*
+ * Writes a split record to the stream, applying correct padding where
+ * appropriate. It is common when sending records containing blobs from Xen
+ * that the header and blob data are separate. This function accepts a second
+ * buffer and length, and will merge it with the main record when sending.
+ *
+ * Records with a non-zero length must provide a valid data field; records
+ * with a 0 length shall have their data field ignored.
+ *
+ * Returns 0 on success and non0 on failure.
+ */
+int write_split_record(struct context *ctx, struct record *rec, void *buf, size_t sz);
+
+/*
+ * Writes a record to the stream, applying correct padding where appropriate.
+ * Records with a non-zero length must provide a valid data field; records
+ * with a 0 length shall have their data field ignored.
+ *
+ * Returns 0 on success and non0 on failure.
+ */
+static inline int write_record(struct context *ctx, struct record *rec)
+{
+ return write_split_record(ctx, rec, NULL, 0);
+}
+
#endif
/*
* Local variables:
diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
index c013e62..e86e5fc 100644
--- a/tools/libxc/saverestore/save.c
+++ b/tools/libxc/saverestore/save.c
@@ -5,8 +5,39 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
struct save_callbacks* callbacks, int hvm,
unsigned long vm_generationid_addr)
{
+ struct context ctx =
+ {
+ .xch = xch,
+ .fd = io_fd,
+ };
+
+ /* Older GCC cant initialise anonymous unions */
+ ctx.save.callbacks = callbacks;
+
IPRINTF("In experimental %s", __func__);
- return -1;
+
+ if ( xc_domain_getinfo(xch, dom, 1, &ctx.dominfo) != 1 )
+ {
+ PERROR("Failed to get domain info");
+ return -1;
+ }
+
+ if ( ctx.dominfo.domid != dom )
+ {
+ ERROR("Domain %d does not exist", dom);
+ return -1;
+ }
+
+ ctx.domid = dom;
+ IPRINTF("Saving domain %d", dom);
+
+ if ( ctx.dominfo.hvm )
+ {
+ ERROR("HVM Save not supported yet");
+ return -1;
+ }
+ else
+ return save_x86_pv(&ctx);
}
/*
diff --git a/tools/libxc/saverestore/save_x86_pv.c b/tools/libxc/saverestore/save_x86_pv.c
new file mode 100644
index 0000000..9f6703d
--- /dev/null
+++ b/tools/libxc/saverestore/save_x86_pv.c
@@ -0,0 +1,843 @@
+#include <assert.h>
+#include <arpa/inet.h>
+
+#include "common_x86_pv.h"
+
+static int write_headers(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ int32_t xen_version = xc_version(xch, XENVER_version, NULL);
+ struct ihdr ihdr =
+ {
+ .marker = IHDR_MARKER,
+ .id = htonl(IHDR_ID),
+ .version = htonl(IHDR_VERSION),
+ .options = htons(IHDR_OPT_LITTLE_ENDIAN),
+ };
+ struct dhdr dhdr =
+ {
+ .type = DHDR_TYPE_x86_pv,
+ .page_shift = 12,
+ .xen_major = (xen_version >> 16) & 0xffff,
+ .xen_minor = (xen_version) & 0xffff,
+ };
+
+ if ( xen_version < 0 )
+ {
+ PERROR("Unable to obtain Xen Version");
+ return -1;
+ }
+
+ if ( write_exact(ctx->fd, &ihdr, sizeof ihdr) )
+ {
+ PERROR("Unable to write Image Header to stream");
+ return -1;
+ }
+
+ if ( write_exact(ctx->fd, &dhdr, sizeof dhdr) )
+ {
+ PERROR("Unable to write Domain Header to stream");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int map_shinfo(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+
+ ctx->x86_pv.shinfo = xc_map_foreign_range(
+ xch, ctx->domid, PAGE_SIZE, PROT_READ, ctx->dominfo.shared_info_frame);
+ if ( !ctx->x86_pv.shinfo )
+ {
+ PERROR("Failed to map shared info frame at pfn %#lx",
+ ctx->dominfo.shared_info_frame);
+ return -1;
+ }
+
+ return 0;
+}
+
+static void copy_pfns_from_guest(struct context *ctx, xen_pfn_t *dst,
+ void *src, size_t count)
+{
+ size_t x;
+
+ if ( ctx->x86_pv.width == sizeof(unsigned long) )
+ memcpy(dst, src, count);
+ else
+ {
+ for ( x = 0; x < count; ++x )
+ {
+#ifdef __x86_64__
+ /* 64bit toolstack, 32bit guest. Expand any INVALID_MFN. */
+ uint32_t s = ((uint32_t *)src)[x];
+
+ dst[x] = s == ~0U ? INVALID_MFN : s;
+#else
+ /* 32bit toolstack, 64bit guest. Truncate their pointers */
+ dst[x] = ((uint64_t *)src)[x];
+#endif
+ }
+ }
+
+}
+
+static int map_p2m(struct context *ctx)
+{
+ /* Terminology:
+ *
+ * fll - frame list list, top level p2m, list of fl mfns
+ * fl - frame list, mid level p2m, list of leaf mfns
+ * local - own allocated buffers, adjusted for bitness
+ * guest - mappings into the domain
+ */
+ xc_interface *xch = ctx->xch;
+ int rc = -1;
+ unsigned tries = 100, x, fpp, fll_entries, fl_entries;
+ xen_pfn_t fll_mfn;
+
+ xen_pfn_t *local_fll = NULL;
+ void *guest_fll = NULL;
+ size_t local_fll_size;
+
+ xen_pfn_t *local_fl = NULL;
+ void *guest_fl = NULL;
+ size_t local_fl_size;
+
+ fpp = ctx->x86_pv.fpp = PAGE_SIZE / ctx->x86_pv.width;
+ fll_entries = (ctx->x86_pv.max_pfn / (fpp * fpp)) + 1;
+ fl_entries = (ctx->x86_pv.max_pfn / fpp) + 1;
+
+ fll_mfn = GET_FIELD(ctx, ctx->x86_pv.shinfo, arch.pfn_to_mfn_frame_list_list);
+ if ( !fll_mfn )
+ IPRINTF("Waiting for domain to set up its p2m frame list list");
+
+ while ( tries-- && !fll_mfn )
+ {
+ usleep(10000);
+ fll_mfn = GET_FIELD(ctx, ctx->x86_pv.shinfo,
+ arch.pfn_to_mfn_frame_list_list);
+ }
+
+ if ( !fll_mfn )
+ {
+ ERROR("Timed out waiting for p2m frame list list to be updated");
+ goto err;
+ }
+
+ /* Map the guest top p2m */
+ guest_fll = xc_map_foreign_range(xch, ctx->domid, PAGE_SIZE,
+ PROT_READ, fll_mfn);
+ if ( !guest_fll )
+ {
+ PERROR("Failed to map p2m frame list list at %#lx", fll_mfn);
+ goto err;
+ }
+
+ local_fll_size = fll_entries * sizeof *local_fll;
+ local_fll = malloc(local_fll_size);
+ if ( !local_fll )
+ {
+ ERROR("Cannot allocate %zu bytes for local p2m frame list list",
+ local_fll_size);
+ goto err;
+ }
+
+ copy_pfns_from_guest(ctx, local_fll, guest_fll, fll_entries);
+
+ /* Map the guest mid p2m frames */
+ guest_fl = xc_map_foreign_pages(xch, ctx->domid, PROT_READ,
+ local_fll, fll_entries);
+ if ( !guest_fl )
+ {
+ PERROR("Failed to map p2m frame list");
+ goto err;
+ }
+
+ local_fl_size = fl_entries * sizeof *local_fl;
+ local_fl = malloc(local_fl_size);
+ if ( !local_fl )
+ {
+ ERROR("Cannot allocate %zu bytes for local p2m frame list",
+ local_fl_size);
+ goto err;
+ }
+
+ copy_pfns_from_guest(ctx, local_fl, guest_fl, fl_entries);
+
+ /* Map the p2m leaves themselves */
+ ctx->x86_pv.p2m = xc_map_foreign_pages(xch, ctx->domid, PROT_READ,
+ local_fl, fl_entries);
+ if ( !ctx->x86_pv.p2m )
+ {
+ PERROR("Failed to map p2m frames");
+ goto err;
+ }
+
+ ctx->x86_pv.p2m_frames = fl_entries;
+ ctx->x86_pv.p2m_pfns = malloc(local_fl_size);
+ if ( !ctx->x86_pv.p2m_pfns )
+ {
+ ERROR("Cannot allocate %zu bytes for p2m pfns list",
+ local_fl_size);
+ goto err;
+ }
+
+ /* Convert leaf frames from mfns to pfns */
+ for ( x = 0; x < fl_entries; ++x )
+ if ( !mfn_in_pseudophysmap(ctx, local_fl[x]) )
+ {
+ ERROR("Bad MFN in p2m_frame_list[%d]", x);
+ pseudophysmap_walk(ctx, local_fl[x]);
+ errno = ERANGE;
+ goto err;
+ }
+ else
+ ctx->x86_pv.p2m_pfns[x] = mfn_to_pfn(ctx, local_fl[x]);
+
+ rc = 0;
+err:
+
+ free(local_fl);
+ if ( guest_fl )
+ munmap(guest_fl, fll_entries * PAGE_SIZE);
+
+ free(local_fll);
+ if ( guest_fll )
+ munmap(guest_fll, PAGE_SIZE);
+
+ return rc;
+}
+
+static int write_one_vcpu_basic(struct context *ctx, uint32_t id)
+{
+ xc_interface *xch = ctx->xch;
+ xen_pfn_t mfn, pfn;
+ unsigned i;
+ int rc = -1;
+ vcpu_guest_context_any_t vcpu;
+ struct rec_x86_pv_vcpu vhdr = { .vcpu_id = id };
+ struct record rec =
+ {
+ .type = REC_TYPE_x86_pv_vcpu_basic,
+ .length = sizeof vhdr,
+ .data = &vhdr,
+ };
+
+ if ( xc_vcpu_getcontext(xch, ctx->domid, id, &vcpu) )
+ {
+ PERROR("Failed to get vcpu%u context", id);
+ goto err;
+ }
+
+ /* Vcpu 0 is special: Convert the suspend record to a PFN */
+ if ( id == 0 )
+ {
+ mfn = GET_FIELD(ctx, &vcpu, user_regs.edx);
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("Bad MFN for suspend record");
+ pseudophysmap_walk(ctx, mfn);
+ errno = ERANGE;
+ goto err;
+ }
+ SET_FIELD(ctx, &vcpu, user_regs.edx, mfn_to_pfn(ctx, mfn));
+ }
+
+ /* Convert GDT frames to PFNs */
+ for ( i = 0; (i * 512) < GET_FIELD(ctx, &vcpu, gdt_ents); ++i )
+ {
+ mfn = GET_FIELD(ctx, &vcpu, gdt_frames[i]);
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("Bad MFN for frame %u of vcpu%u's GDT", i, id);
+ pseudophysmap_walk(ctx, mfn);
+ errno = ERANGE;
+ goto err;
+ }
+ SET_FIELD(ctx, &vcpu, gdt_frames[i], mfn_to_pfn(ctx, mfn));
+ }
+
+ /* Convert CR3 to a PFN */
+ mfn = cr3_to_mfn(ctx, GET_FIELD(ctx, &vcpu, ctrlreg[3]));
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("Bad MFN for vcpu%u's cr3", id);
+ pseudophysmap_walk(ctx, mfn);
+ errno = ERANGE;
+ goto err;
+ }
+ pfn = mfn_to_pfn(ctx, mfn);
+ SET_FIELD(ctx, &vcpu, ctrlreg[3], mfn_to_cr3(ctx, pfn));
+
+ /* 64bit guests: Convert CR1 (guest pagetables) to PFN */
+ if ( ctx->x86_pv.levels == 4 && vcpu.x64.ctrlreg[1] )
+ {
+ mfn = vcpu.x64.ctrlreg[1] >> PAGE_SHIFT;
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("Bad MFN for vcpu%u's cr1", id);
+ pseudophysmap_walk(ctx, mfn);
+ errno = ERANGE;
+ goto err;
+ }
+
+ pfn = mfn_to_pfn(ctx, mfn);
+ vcpu.x64.ctrlreg[1] = 1 | ((uint64_t)pfn << PAGE_SHIFT);
+ }
+
+ if ( ctx->x86_pv.width == 8 )
+ rc = write_split_record(ctx, &rec, &vcpu, sizeof vcpu.x64);
+ else
+ rc = write_split_record(ctx, &rec, &vcpu, sizeof vcpu.x32);
+
+ if ( rc )
+ goto err;
+
+ DPRINTF("Writing vcpu%u basic context", id);
+ rc = 0;
+ err:
+
+ return rc;
+}
+
+static int write_one_vcpu_extended(struct context *ctx, uint32_t id)
+{
+ xc_interface *xch = ctx->xch;
+ int rc;
+ struct rec_x86_pv_vcpu vhdr = { .vcpu_id = id };
+ struct record rec =
+ {
+ .type = REC_TYPE_x86_pv_vcpu_extended,
+ .length = sizeof vhdr,
+ .data = &vhdr,
+ };
+ DECLARE_DOMCTL;
+
+ domctl.cmd = XEN_DOMCTL_get_ext_vcpucontext;
+ domctl.domain = ctx->domid;
+ domctl.u.ext_vcpucontext.vcpu = id;
+
+ if ( xc_domctl(xch, &domctl) < 0 )
+ {
+ PERROR("Unable to get vcpu%u extended context", id);
+ return -1;
+ }
+
+ rc = write_split_record(ctx, &rec, &domctl.u.ext_vcpucontext,
+ domctl.u.ext_vcpucontext.size);
+ if ( rc )
+ return rc;
+
+ DPRINTF("Writing vcpu%u extended context", id);
+
+ return 0;
+}
+
+static int write_one_vcpu_xsave(struct context *ctx, uint32_t id)
+{
+ xc_interface *xch = ctx->xch;
+ int rc = -1;
+ DECLARE_DOMCTL;
+ DECLARE_HYPERCALL_BUFFER(void, buffer);
+ struct rec_x86_pv_vcpu_xsave vhdr = { .vcpu_id = id };
+ struct record rec =
+ {
+ .type = REC_TYPE_x86_pv_vcpu_xsave,
+ .length = sizeof vhdr,
+ .data = &vhdr,
+ };
+
+ domctl.cmd = XEN_DOMCTL_getvcpuextstate;
+ domctl.domain = ctx->domid;
+ domctl.u.vcpuextstate.vcpu = id;
+ domctl.u.vcpuextstate.xfeature_mask = 0;
+ domctl.u.vcpuextstate.size = 0;
+
+ if ( xc_domctl(xch, &domctl) < 0 )
+ {
+ PERROR("Unable to get vcpu%u's xsave context", id);
+ goto err;
+ }
+
+ if ( !domctl.u.vcpuextstate.xfeature_mask )
+ {
+ DPRINTF("vcpu%u has no xsave context - skipping", id);
+ goto out;
+ }
+
+ buffer = xc_hypercall_buffer_alloc(xch, buffer, domctl.u.vcpuextstate.size);
+ if ( !buffer )
+ {
+ ERROR("Unable to allocate %"PRIx64" bytes for vcpu%u's xsave context",
+ domctl.u.vcpuextstate.size, id);
+ goto err;
+ }
+
+ set_xen_guest_handle(domctl.u.vcpuextstate.buffer, buffer);
+ if ( xc_domctl(xch, &domctl) < 0 )
+ {
+ PERROR("Unable to get vcpu%u's xsave context", id);
+ goto err;
+ }
+
+ vhdr.xfeature_mask = domctl.u.vcpuextstate.xfeature_mask;
+
+ rc = write_split_record(ctx, &rec, buffer, domctl.u.vcpuextstate.size);
+ if ( rc )
+ goto err;
+
+ DPRINTF("Writing vcpu%u xsave context", id);
+
+ out:
+ rc = 0;
+
+ err:
+ xc_hypercall_buffer_free(xch, buffer);
+
+ return rc;
+}
+
+static int write_all_vcpu_information(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ xc_vcpuinfo_t vinfo;
+ unsigned int i;
+ int rc;
+
+ for ( i = 0; i <= ctx->dominfo.max_vcpu_id; ++i )
+ {
+ rc = xc_vcpu_getinfo(xch, ctx->domid, i, &vinfo);
+ if ( rc )
+ {
+ PERROR("Failed to get vcpu%u information", i);
+ return rc;
+ }
+
+ if ( !vinfo.online )
+ {
+ DPRINTF("vcpu%u offline - skipping", i);
+ continue;
+ }
+
+ rc = write_one_vcpu_basic(ctx, i) ?:
+ write_one_vcpu_extended(ctx, i) ?:
+ write_one_vcpu_xsave(ctx, i);
+ if ( rc )
+ return rc;
+ };
+
+ return 0;
+}
+
+static int normalise_pagetable(struct context *ctx, const uint64_t *src,
+ uint64_t *dst, unsigned long type)
+{
+ xc_interface *xch = ctx->xch;
+ uint64_t pte;
+ unsigned i, xen_first = -1, xen_last = -1; /* Indicies of Xen mappings */
+
+ type &= XEN_DOMCTL_PFINFO_LTABTYPE_MASK;
+
+ if ( ctx->x86_pv.levels == 4 )
+ {
+ /* 64bit guests only have Xen mappings in their L4 tables */
+ if ( type == XEN_DOMCTL_PFINFO_L4TAB )
+ {
+ xen_first = 256;
+ xen_last = 271;
+ }
+ }
+ else
+ {
+ switch ( type )
+ {
+ case XEN_DOMCTL_PFINFO_L4TAB:
+ ERROR("??? Found L4 table for 32bit guest");
+ return -1;
+
+ case XEN_DOMCTL_PFINFO_L3TAB:
+ /* 32bit guests can only use the first 4 entries of their L3 tables.
+ * All other are potentially used by Xen. */
+ xen_first = 4;
+ xen_last = 512;
+ break;
+
+ case XEN_DOMCTL_PFINFO_L2TAB:
+ /* It is hard to spot Xen mappings in a 32bit guest's L2. Most
+ * are normal but only a few will have Xen mappings.
+ *
+ * 428 = (HYPERVISOR_VIRT_START_PAE >> L2_PAGETABLE_SHIFT_PAE) & 0x1ff
+ *
+ * ...which is conveniently unavailable to us in a 64bit build.
+ * But not to worry, because ctx->m2p_mfn0 depends on the bitness
+ * of the toolstack anway, meaning that a 64bit toolstack can't
+ * spot 32bit guest Xen mappings... (nor could the old migration
+ * code, but was hidden by a further bug)
+ */
+ if ( pte_to_frame(ctx, src[428]) == ctx->x86_pv.m2p_mfn0 )
+ {
+ xen_first = 428;
+ xen_last = 512;
+ }
+ break;
+ }
+ }
+
+ for ( i = 0; i < (PAGE_SIZE / sizeof(uint64_t)); ++i )
+ {
+ xen_pfn_t mfn, pfn;
+
+ pte = src[i];
+
+ /* Remove Xen mappings: Xen will reconstruct on the other side */
+ if ( i >= xen_first && i <= xen_last )
+ pte = 0;
+
+ if ( pte & _PAGE_PRESENT )
+ {
+ mfn = pte_to_frame(ctx, pte);
+
+ if ( pte & _PAGE_PSE )
+ {
+ ERROR("It is impossible to migrate PV guests using superpages");
+ return -1;
+ }
+
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("Bad MFN for L%lu[%u]",
+ type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT, i);
+ pseudophysmap_walk(ctx, mfn);
+ errno = ERANGE;
+ return -1;
+ }
+ else
+ pfn = mfn_to_pfn(ctx, mfn);
+
+ update_pte(ctx, &pte, pfn);
+ }
+
+ dst[i] = pte;
+ }
+
+ return 0;
+}
+
+static int write_all_memory(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ xen_pfn_t x = 0, mfn, type;
+ int rc = -1, err;
+ void *guest_page = NULL;
+ void *local_page = malloc(PAGE_SIZE);
+
+ struct
+ {
+ struct rec_page_data_header h;
+ uint64_t pfn;
+ } page_data = { { 1, 0 } , 0 };
+
+ struct record rec =
+ {
+ .type = REC_TYPE_page_data,
+ .length = sizeof page_data,
+ .data = &page_data,
+ };
+
+ XC_BUILD_BUG_ON(sizeof page_data != 16);
+
+ if ( !local_page )
+ {
+ ERROR("Unable to allocate local scratch page");
+ goto err;
+ }
+
+ for ( x = 0; x <= ctx->x86_pv.max_pfn; ++x )
+ {
+ type = mfn = pfn_to_mfn(ctx, x);
+
+ if ( xc_get_pfn_type_batch(xch, ctx->domid, 1, &type) )
+ {
+ PERROR("Unable to get mfn %#lx type", mfn);
+ goto err;
+ }
+
+ if ( (type & ~XEN_DOMCTL_PFINFO_LTAB_MASK) ||
+ (((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) >= 5) &&
+ ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) <= 8)) )
+ {
+ ERROR("Invalid type %#lx for mfn %#lx", type, mfn);
+ goto err;
+ }
+
+ page_data.pfn = (((uint64_t)type) << 32) | x;
+
+ switch (type)
+ {
+ case XEN_DOMCTL_PFINFO_BROKEN:
+ case XEN_DOMCTL_PFINFO_XALLOC:
+ case XEN_DOMCTL_PFINFO_XTAB:
+ if ( write_record(ctx, &rec) )
+ goto err;
+ continue;
+ }
+
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("Bad pfn %#lx", x);
+ pseudophysmap_walk(ctx, mfn);
+ goto err;
+ }
+
+ guest_page = xc_map_foreign_bulk(
+ xch, ctx->domid, PROT_READ, &mfn, &err, 1);
+ if ( !guest_page || err )
+ {
+ PERROR("Unable to map mfn %#lx (err %d)", mfn, err);
+ goto err;
+ }
+
+ switch (type & XEN_DOMCTL_PFINFO_LTABTYPE_MASK)
+ {
+ case XEN_DOMCTL_PFINFO_L1TAB:
+ case XEN_DOMCTL_PFINFO_L2TAB:
+ case XEN_DOMCTL_PFINFO_L3TAB:
+ case XEN_DOMCTL_PFINFO_L4TAB:
+ if ( normalise_pagetable(ctx, guest_page, local_page, type) ||
+ write_split_record(ctx, &rec, local_page, PAGE_SIZE) )
+ goto err;
+ break;
+
+ case XEN_DOMCTL_PFINFO_NOTAB:
+ if ( write_split_record(ctx, &rec, guest_page, PAGE_SIZE) )
+ goto err;
+ break;
+ }
+
+ munmap(guest_page, PAGE_SIZE);
+ guest_page = NULL;
+ }
+
+
+ DPRINTF("Finished All Memory");
+ rc = 0;
+
+ err:
+ if ( guest_page )
+ munmap(guest_page, PAGE_SIZE);
+ free(local_page);
+
+ return rc;
+}
+
+static int write_x86_pv_info(struct context *ctx)
+{
+ struct rec_x86_pv_info info =
+ {
+ .guest_width = ctx->x86_pv.width,
+ .pt_levels = ctx->x86_pv.levels,
+ };
+ struct record rec =
+ {
+ .type = REC_TYPE_x86_pv_info,
+ .length = sizeof info,
+ .data = &info
+ };
+
+ return write_record(ctx, &rec);
+}
+
+static int write_x86_pv_p2m_frames(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ int rc; unsigned i;
+ size_t datasz = ctx->x86_pv.p2m_frames * sizeof(uint64_t);
+ uint64_t *data = NULL;
+ struct rec_x86_pv_p2m_frames hdr =
+ {
+ .start_pfn = 0,
+ .end_pfn = ctx->x86_pv.max_pfn,
+ };
+ struct record rec =
+ {
+ .type = REC_TYPE_x86_pv_p2m_frames,
+ .length = sizeof hdr,
+ .data = &hdr,
+ };
+
+ /* No need to translate if sizeof(uint64_t) == sizeof(xen_pfn_t) */
+ if ( sizeof(uint64_t) != sizeof(*ctx->x86_pv.p2m_pfns) )
+ {
+ if ( !(data = malloc(datasz)) )
+ {
+ ERROR("Cannot allocate %zu bytes for X86_PV_P2M_FRAMES data", datasz);
+ return -1;
+ }
+
+ for ( i = 0; i < ctx->x86_pv.p2m_frames; ++i )
+ data[i] = ctx->x86_pv.p2m_pfns[i];
+ }
+ else
+ data = (uint64_t *)ctx->x86_pv.p2m_pfns;
+
+ rc = write_split_record(ctx, &rec, data, datasz);
+
+ if ( data != (uint64_t *)ctx->x86_pv.p2m_pfns )
+ free(data);
+
+ return rc;
+}
+
+static int write_x86_pv_shared_info(struct context *ctx)
+{
+ struct record rec =
+ {
+ .type = REC_TYPE_x86_pv_shared_info,
+ .length = PAGE_SIZE,
+ .data = ctx->x86_pv.shinfo,
+ };
+
+ return write_record(ctx, &rec);
+}
+
+static int write_tsc_info(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ struct rec_tsc_info tsc = { 0 };
+ struct record rec =
+ {
+ .type = REC_TYPE_tsc_info,
+ .length = sizeof tsc,
+ .data = &tsc
+ };
+
+ if ( xc_domain_get_tsc_info(xch, ctx->domid, &tsc.mode,
+ &tsc.nsec, &tsc.khz, &tsc.incarnation) < 0 )
+ {
+ PERROR("Unable to obtain TSC information");
+ return -1;
+ }
+
+ return write_record(ctx, &rec);
+}
+
+int save_x86_pv(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ int rc;
+ struct record end = { REC_TYPE_end, 0, NULL };
+
+ IPRINTF("In experimental %s", __func__);
+
+ /* TODO - make this a little more live... */
+ if ( !ctx->dominfo.paused )
+ {
+ rc = (ctx->save.callbacks->suspend(ctx->save.callbacks->data) != 1);
+ if ( rc )
+ {
+ ERROR("Failed to suspend domain");
+ goto err;
+ }
+ }
+
+ /* Write Image and Domain headers to the stream */
+ rc = write_headers(ctx);
+ if ( rc )
+ goto err;
+
+ /* Query some properties, and stash in the save context */
+ rc = x86_pv_domain_info(ctx);
+ if ( rc )
+ goto err;
+
+ /* Write an X86_PV_INFO record into the stream */
+ rc = write_x86_pv_info(ctx);
+ if ( rc )
+ goto err;
+
+ /* Map various structures */
+ rc = x86_pv_map_m2p(ctx) ?: map_shinfo(ctx) ?: map_p2m(ctx);
+ if ( rc )
+ goto err;
+
+ /* Write a full X86_PV_P2M_FRAMES record into the stream */
+ rc = write_x86_pv_p2m_frames(ctx);
+ if ( rc )
+ goto err;
+
+ /* DOMAIN MUST BE PAUSED FROM THIS POINT ONWARDS */
+
+
+ rc = write_all_memory(ctx); /* TODO: only valid for non-live migrate */
+ if ( rc )
+ goto err;
+
+ rc = write_tsc_info(ctx);
+ if ( rc )
+ goto err;
+
+ rc = write_x86_pv_shared_info(ctx);
+ if ( rc )
+ goto err;
+
+ /* Refresh domain information now it has paused */
+ if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
+ (ctx->dominfo.domid != ctx->domid) )
+ {
+ PERROR("Unable to refresh domain information");
+ rc = -1;
+ goto err;
+ }
+ else if ( (!ctx->dominfo.shutdown ||
+ ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
+ !ctx->dominfo.paused )
+ {
+ ERROR("Domain has not been suspended");
+ rc = -1;
+ goto err;
+ }
+
+ /* Write all the vcpu information */
+ rc = write_all_vcpu_information(ctx);
+ if ( rc )
+ goto err;
+
+ /* Write an END record */
+ rc = write_record(ctx, &end);
+ if ( rc )
+ goto err;
+
+ /* all done */
+ assert(!rc);
+ goto cleanup;
+
+ err:
+ assert(rc);
+ cleanup:
+
+ free(ctx->x86_pv.p2m_pfns);
+
+ if ( ctx->x86_pv.p2m )
+ munmap(ctx->x86_pv.p2m, ctx->x86_pv.p2m_frames * PAGE_SIZE);
+
+ if ( ctx->x86_pv.shinfo )
+ munmap(ctx->x86_pv.shinfo, PAGE_SIZE);
+
+ if ( ctx->x86_pv.m2p )
+ munmap(ctx->x86_pv.m2p, ctx->x86_pv.nr_m2p_frames * PAGE_SIZE);
+
+ return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 6/6] tools/libxc: x86 pv restore implementation
2014-04-09 18:28 [PATCH 0/6] [VERY RFC] Migration Stream v2 Andrew Cooper
` (4 preceding siblings ...)
2014-04-09 18:28 ` [PATCH 5/6] tools/libxc: x86 pv save implementation Andrew Cooper
@ 2014-04-09 18:28 ` Andrew Cooper
2014-04-10 10:42 ` [PATCH 0/6] [VERY RFC] Migration Stream v2 Ian Campbell
2014-04-23 13:47 ` Ian Campbell
7 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-09 18:28 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Frediano Ziglio
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libxc/saverestore/common.c | 51 ++
tools/libxc/saverestore/common.h | 35 ++
tools/libxc/saverestore/restore.c | 112 +++-
tools/libxc/saverestore/restore_x86_pv.c | 977 ++++++++++++++++++++++++++++++
4 files changed, 1174 insertions(+), 1 deletion(-)
create mode 100644 tools/libxc/saverestore/restore_x86_pv.c
diff --git a/tools/libxc/saverestore/common.c b/tools/libxc/saverestore/common.c
index df18447..dbfae21 100644
--- a/tools/libxc/saverestore/common.c
+++ b/tools/libxc/saverestore/common.c
@@ -84,6 +84,57 @@ int write_split_record(struct context *ctx, struct record *rec,
return 0;
}
+int read_record(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+ struct rhdr rhdr;
+ size_t datasz;
+
+ if ( read_exact(ctx->fd, &rhdr, sizeof rhdr) )
+ {
+ PERROR("Failed to read Record Header from stream");
+ return -1;
+ }
+ else if ( rhdr.length > REC_LENGTH_MAX )
+ {
+ ERROR("Record (0x%08"PRIx32", %s) length 0x%"PRIx32
+ " exceeds max (0x%"PRIx32")",
+ rhdr.type, rec_type_to_str(rhdr.type),
+ rhdr.length, REC_LENGTH_MAX);
+ return -1;
+ }
+
+ datasz = (rhdr.length + 7) & ~7U;
+
+ if ( datasz )
+ {
+ rec->data = malloc(datasz);
+
+ if ( !rec->data )
+ {
+ ERROR("Unable to allocate %zu bytes for record data (0x%08"PRIx32", %s)",
+ datasz, rhdr.type, rec_type_to_str(rhdr.type));
+ return -1;
+ }
+
+ if ( read_exact(ctx->fd, rec->data, datasz) )
+ {
+ free(rec->data);
+ rec->data = NULL;
+ PERROR("Failed to read %zu bytes of data for record (0x%08"PRIx32", %s)",
+ datasz, rhdr.type, rec_type_to_str(rhdr.type));
+ return -1;
+ }
+ }
+ else
+ rec->data = NULL;
+
+ rec->type = rhdr.type;
+ rec->length = rhdr.length;
+
+ return 0;
+};
+
/*
* Local variables:
* mode: C
diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index a2c8cee..249e18f 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -7,9 +7,12 @@
#include "../xg_private.h"
#include "../xg_save_restore.h"
+#include "../xc_dom.h"
#undef GET_FIELD
#undef SET_FIELD
+#undef MEMCPY_FIELD
+#undef MEMSET_ARRAY_FIELD
#undef mfn_to_pfn
#undef pfn_to_mfn
@@ -95,6 +98,8 @@ struct context
/* Saves an x86 PV domain. */
int save_x86_pv(struct context *ctx);
+/* Restores an x86 PV domain. */
+int restore_x86_pv(struct context *ctx);
struct record
{
@@ -118,6 +123,22 @@ struct record
(_p)->x32._f = (_v); \
})
+/* memcpy field _f from _s to _d, of an *_any union */
+#define MEMCPY_FIELD(_c, _d, _s, _f) \
+ ({ if ( (_c)->x86_pv.width == 8 ) \
+ memcpy(&(_d)->x64._f, &(_s)->x64._f, sizeof((_d)->x64._f)); \
+ else \
+ memcpy(&(_d)->x32._f, &(_s)->x32._f, sizeof((_d)->x32._f)); \
+ })
+
+/* memset array field _f with value _v, from an *_any union */
+#define MEMSET_ARRAY_FIELD(_c, _d, _f, _v) \
+ ({ if ( (_c)->x86_pv.width == 8 ) \
+ memset(&(_d)->x64._f[0], (_v), sizeof((_d)->x64._f)); \
+ else \
+ memset(&(_d)->x32._f[0], (_v), sizeof((_d)->x32._f)); \
+ })
+
/*
* Writes a split record to the stream, applying correct padding where
* appropriate. It is common when sending records containing blobs from Xen
@@ -143,6 +164,20 @@ static inline int write_record(struct context *ctx, struct record *rec)
return write_split_record(ctx, rec, NULL, 0);
}
+/*
+ * Reads a record from the stream, and fills in the record structure.
+ *
+ * Returns 0 on success and non-0 on failure.
+ *
+ * On success, the records type and size shall be valid.
+ * - If size is 0, data shall be NULL.
+ * - If size is non-0, data shall be a buffer allocated by malloc() which must
+ * be passed to free() by the caller.
+ *
+ * On failure, the contents of the record structure are undefined.
+ */
+int read_record(struct context *ctx, struct record *rec);
+
#endif
/*
* Local variables:
diff --git a/tools/libxc/saverestore/restore.c b/tools/libxc/saverestore/restore.c
index 6624baa..6937aec 100644
--- a/tools/libxc/saverestore/restore.c
+++ b/tools/libxc/saverestore/restore.c
@@ -1,5 +1,62 @@
+#include <arpa/inet.h>
+
#include "common.h"
+static int read_headers(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ struct ihdr ihdr;
+ struct dhdr dhdr;
+
+ if ( read_exact(ctx->fd, &ihdr, sizeof ihdr) )
+ {
+ PERROR("Failed to read Image Header from stream");
+ return -1;
+ }
+
+ ihdr.id = ntohl(ihdr.id);
+ ihdr.version = ntohl(ihdr.version);
+ ihdr.options = ntohs(ihdr.options);
+
+ if ( ihdr.marker != IHDR_MARKER )
+ {
+ ERROR("Invalid marker: Got 0x%016"PRIx64, ihdr.marker);
+ return -1;
+ }
+ else if ( ihdr.id != IHDR_ID )
+ {
+ ERROR("Invalid ID: Expected 0x%08"PRIx32", Got 0x%08"PRIx32,
+ IHDR_ID, ihdr.id);
+ return -1;
+ }
+ else if ( ihdr.version != IHDR_VERSION )
+ {
+ ERROR("Invalid Version: Expected %d, Got %d", ihdr.version, IHDR_VERSION);
+ return -1;
+ }
+ else if ( ihdr.options & IHDR_OPT_BIG_ENDIAN )
+ {
+ ERROR("Unable to handle big endian streams");
+ return -1;
+ }
+
+ ctx->restore.format_version = ihdr.version;
+
+ if ( read_exact(ctx->fd, &dhdr, sizeof dhdr) )
+ {
+ PERROR("Failed to read Domain Header from stream");
+ return -1;
+ }
+
+ ctx->restore.guest_type = dhdr.type;
+ ctx->restore.guest_page_size = (1U << dhdr.page_shift);
+
+ IPRINTF("Found %s domain from Xen %d.%d",
+ dhdr_type_to_str(dhdr.type), dhdr.xen_major, dhdr.xen_minor);
+ return 0;
+}
+
+
int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
unsigned int store_evtchn, unsigned long *store_mfn,
domid_t store_domid, unsigned int console_evtchn,
@@ -8,8 +65,61 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
int checkpointed_stream,
struct restore_callbacks *callbacks)
{
+ struct context ctx =
+ {
+ .xch = xch,
+ .fd = io_fd,
+ };
+
+ ctx.restore.console_evtchn = console_evtchn;
+ ctx.restore.console_domid = console_domid;
+ ctx.restore.xenstore_evtchn = store_evtchn;
+ ctx.restore.xenstore_domid = store_domid;
+
IPRINTF("In experimental %s", __func__);
- return -1;
+
+ if ( xc_domain_getinfo(xch, dom, 1, &ctx.dominfo) != 1 )
+ {
+ PERROR("Failed to get domain info");
+ return -1;
+ }
+
+ if ( ctx.dominfo.domid != dom )
+ {
+ ERROR("Domain %d does not exist", dom);
+ return -1;
+ }
+
+ ctx.domid = dom;
+ IPRINTF("Restoring domain %d", dom);
+
+ if ( read_headers(&ctx) )
+ return -1;
+
+ if ( ctx.dominfo.hvm )
+ {
+ ERROR("HVM Restore not supported yet");
+ return -1;
+ }
+ else
+ {
+ if ( restore_x86_pv(&ctx) )
+ return -1;
+
+ DPRINTF("XenStore: mfn %#lx, dom %d, evt %u",
+ ctx.restore.xenstore_mfn,
+ ctx.restore.xenstore_domid,
+ ctx.restore.xenstore_evtchn);
+
+ DPRINTF("Console: mfn %#lx, dom %d, evt %u",
+ ctx.restore.console_mfn,
+ ctx.restore.console_domid,
+ ctx.restore.console_evtchn);
+
+ *console_mfn = ctx.restore.console_mfn;
+ *store_mfn = ctx.restore.xenstore_mfn;
+ return 0;
+ }
}
/*
diff --git a/tools/libxc/saverestore/restore_x86_pv.c b/tools/libxc/saverestore/restore_x86_pv.c
new file mode 100644
index 0000000..0659244
--- /dev/null
+++ b/tools/libxc/saverestore/restore_x86_pv.c
@@ -0,0 +1,977 @@
+#include <assert.h>
+#include <arpa/inet.h>
+
+#include "common_x86_pv.h"
+
+static int expand_p2m(struct context *ctx, unsigned long max_pfn)
+{
+ xc_interface *xch = ctx->xch;
+ unsigned long old_max = ctx->x86_pv.max_pfn, i;
+ unsigned long end_frame = (max_pfn + ctx->x86_pv.fpp) / ctx->x86_pv.fpp;
+ unsigned long old_end_frame = (old_max + ctx->x86_pv.fpp) / ctx->x86_pv.fpp;
+ xen_pfn_t *p2m = NULL, *p2m_pfns = NULL;
+ uint32_t *pfn_types = NULL;
+ size_t p2msz, p2m_pfnsz, pfn_typesz;
+
+ /* We expect expand_p2m to be called exactly once, expanding from 0 the
+ * domains max, but assert some sanity */
+ assert(max_pfn > old_max);
+
+ p2msz = (max_pfn + 1) * ctx->x86_pv.width;
+ p2m = realloc(ctx->x86_pv.p2m, p2msz);
+ if ( !p2m )
+ {
+ ERROR("Failed to (re)alloc %zu bytes for p2m", p2msz);
+ return -1;
+ }
+ ctx->x86_pv.p2m = p2m;
+
+ pfn_typesz = (max_pfn + 1) * sizeof *pfn_types;
+ pfn_types = realloc(ctx->x86_pv.pfn_types, pfn_typesz);
+ if ( !pfn_types )
+ {
+ ERROR("Failed to (re)alloc %zu bytes for pfn_types", pfn_typesz);
+ return -1;
+ }
+ ctx->x86_pv.pfn_types = pfn_types;
+
+ p2m_pfnsz = (end_frame + 1) * sizeof *p2m_pfns;
+ p2m_pfns = realloc(ctx->x86_pv.p2m_pfns, p2m_pfnsz);
+ if ( !p2m_pfns )
+ {
+ ERROR("Failed to (re)alloc %zu bytes for p2m frame list", p2m_pfnsz);
+ return -1;
+ }
+ ctx->x86_pv.p2m_frames = end_frame;
+ ctx->x86_pv.p2m_pfns = p2m_pfns;
+
+ ctx->x86_pv.max_pfn = max_pfn;
+ for ( i = (old_max ? old_max + 1 : 0); i <= max_pfn; ++i )
+ {
+ set_p2m(ctx, i, INVALID_MFN);
+ ctx->x86_pv.pfn_types[i] = 0;
+ }
+
+ for ( i = (old_end_frame ? old_end_frame + 1 : 0); i <= end_frame; ++i )
+ ctx->x86_pv.p2m_pfns[i] = INVALID_MFN;
+
+ DPRINTF("Expanded p2m from %#lx to %#lx", old_max, max_pfn);
+ return 0;
+}
+
+static int pin_pagetables(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ unsigned long i;
+ struct mmuext_op pin;
+
+ DPRINTF("Pinning pagetables");
+
+ for ( i = 0; i <= ctx->x86_pv.max_pfn; ++i )
+ {
+ if ( (ctx->x86_pv.pfn_types[i] & XEN_DOMCTL_PFINFO_LPINTAB) == 0 )
+ continue;
+
+ switch ( ctx->x86_pv.pfn_types[i] & XEN_DOMCTL_PFINFO_LTABTYPE_MASK )
+ {
+ case XEN_DOMCTL_PFINFO_L1TAB:
+ pin.cmd = MMUEXT_PIN_L1_TABLE;
+ break;
+ case XEN_DOMCTL_PFINFO_L2TAB:
+ pin.cmd = MMUEXT_PIN_L2_TABLE;
+ break;
+ case XEN_DOMCTL_PFINFO_L3TAB:
+ pin.cmd = MMUEXT_PIN_L3_TABLE;
+ break;
+ case XEN_DOMCTL_PFINFO_L4TAB:
+ pin.cmd = MMUEXT_PIN_L4_TABLE;
+ break;
+ default:
+ continue;
+ }
+
+ pin.arg1.mfn = pfn_to_mfn(ctx, i);
+
+ if ( xc_mmuext_op(xch, &pin, 1, ctx->domid) != 0 )
+ {
+ PERROR("Failed to pin page table for pfn %#lx", i);
+ return -1;
+ }
+
+ }
+
+ return 0;
+}
+
+static int process_start_info(struct context *ctx, vcpu_guest_context_any_t *vcpu)
+{
+ xc_interface *xch = ctx->xch;
+ xen_pfn_t pfn, mfn;
+ start_info_any_t *guest_start_info = NULL;
+ int rc = -1;
+
+ pfn = GET_FIELD(ctx, vcpu, user_regs.edx);
+
+ if ( pfn > ctx->x86_pv.max_pfn )
+ {
+ ERROR("Start Info pfn %#lx out of range", pfn);
+ goto err;
+ }
+ else if ( ctx->x86_pv.pfn_types[pfn] != XEN_DOMCTL_PFINFO_NOTAB )
+ {
+ ERROR("Start Info pfn %#lx has bad type %lu", pfn,
+ ctx->x86_pv.pfn_types[pfn] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+ goto err;
+ }
+
+ mfn = pfn_to_mfn(ctx, pfn);
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("Start Info has bad MFN");
+ pseudophysmap_walk(ctx, mfn);
+ goto err;
+ }
+
+ guest_start_info = xc_map_foreign_range(
+ xch, ctx->domid, PAGE_SIZE, PROT_READ | PROT_WRITE, mfn);
+ if ( !guest_start_info )
+ {
+ PERROR("Failed to map Start Info at mfn %#lx", mfn);
+ goto err;
+ }
+
+ /* Deal with xenstore stuff */
+ pfn = GET_FIELD(ctx, guest_start_info, store_mfn);
+ if ( pfn > ctx->x86_pv.max_pfn )
+ {
+ ERROR("XenStore pfn %#lx out of range", pfn);
+ goto err;
+ }
+
+ mfn = pfn_to_mfn(ctx, pfn);
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("XenStore pfn has bad MFN");
+ pseudophysmap_walk(ctx, mfn);
+ goto err;
+ }
+
+ ctx->restore.xenstore_mfn = mfn;
+ SET_FIELD(ctx, guest_start_info, store_mfn, mfn);
+ SET_FIELD(ctx, guest_start_info, store_evtchn, ctx->restore.xenstore_evtchn);
+
+
+ /* Deal with console stuff */
+ pfn = GET_FIELD(ctx, guest_start_info, console.domU.mfn);
+ if ( pfn > ctx->x86_pv.max_pfn )
+ {
+ ERROR("Console pfn %#lx out of range", pfn);
+ goto err;
+ }
+
+ mfn = pfn_to_mfn(ctx, pfn);
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("Console pfn has bad MFN");
+ pseudophysmap_walk(ctx, mfn);
+ goto err;
+ }
+
+ ctx->restore.console_mfn = mfn;
+ SET_FIELD(ctx, guest_start_info, console.domU.mfn, mfn);
+ SET_FIELD(ctx, guest_start_info, console.domU.evtchn, ctx->restore.console_evtchn);
+
+ /* Set other information */
+ SET_FIELD(ctx, guest_start_info, nr_pages, ctx->x86_pv.max_pfn + 1);
+ SET_FIELD(ctx, guest_start_info, shared_info,
+ ctx->dominfo.shared_info_frame << PAGE_SHIFT);
+ SET_FIELD(ctx, guest_start_info, flags, 0);
+
+ SET_FIELD(ctx, vcpu, user_regs.edx, mfn);
+ rc = 0;
+
+err:
+ if ( guest_start_info )
+ munmap(guest_start_info, PAGE_SIZE);
+
+ return rc;
+}
+
+static int update_guest_p2m(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ xen_pfn_t mfn, pfn, *guest_p2m = NULL;
+ unsigned i;
+ int rc = -1;
+
+ for ( i = 0; i < ctx->x86_pv.p2m_frames; ++i )
+ {
+ pfn = ctx->x86_pv.p2m_pfns[i];
+
+ if ( pfn > ctx->x86_pv.max_pfn )
+ {
+ ERROR("pfn (%#lx) for p2m_frame_list[%u] out of range",
+ pfn, i);
+ goto err;
+ }
+ else if ( ctx->x86_pv.pfn_types[pfn] != XEN_DOMCTL_PFINFO_NOTAB )
+ {
+ ERROR("pfn (%#lx) for p2m_frame_list[%u] has bad type %lu", pfn, i,
+ ctx->x86_pv.pfn_types[pfn] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+ goto err;
+ }
+
+ mfn = pfn_to_mfn(ctx, pfn);
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("p2m_frame_list[%u] has bad MFN", i);
+ pseudophysmap_walk(ctx, mfn);
+ goto err;
+ }
+
+ ctx->x86_pv.p2m_pfns[i] = mfn;
+ }
+
+ guest_p2m = xc_map_foreign_pages(xch, ctx->domid, PROT_WRITE,
+ ctx->x86_pv.p2m_pfns,
+ ctx->x86_pv.p2m_frames );
+ if ( !guest_p2m )
+ {
+ PERROR("Failed to map p2m frames");
+ goto err;
+ }
+
+ memcpy(guest_p2m, ctx->x86_pv.p2m,
+ (ctx->x86_pv.max_pfn + 1) * ctx->x86_pv.width);
+ rc = 0;
+ err:
+ if ( guest_p2m )
+ munmap(guest_p2m, ctx->x86_pv.p2m_frames * PAGE_SIZE);
+
+ return rc;
+}
+
+static int populate_pfn(struct context *ctx, xen_pfn_t pfn)
+{
+ xc_interface *xch = ctx->xch;
+ xen_pfn_t mfn = pfn;
+ int rc;
+
+ if ( pfn_to_mfn(ctx, pfn) != INVALID_MFN )
+ return 0;
+
+ rc = xc_domain_populate_physmap_exact(xch, ctx->domid, 1, 0, 0, &mfn);
+ if ( rc )
+ {
+ ERROR("Failed to populate physmap");
+ return rc;
+ }
+
+ set_p2m(ctx, pfn, mfn);
+
+ /* This *really* should be true by now, or something has gone very wrong */
+ assert(mfn_in_pseudophysmap(ctx, mfn));
+
+ return 0;
+}
+
+static int localise_pagetable(struct context *ctx, uint64_t *table, xen_pfn_t type)
+{
+ xc_interface *xch = ctx->xch;
+ uint64_t pte;
+ unsigned i;
+
+ type &= XEN_DOMCTL_PFINFO_LTABTYPE_MASK;
+
+ for ( i = 0; i < (PAGE_SIZE / sizeof(uint64_t)); ++i )
+ {
+ pte = table[i];
+
+ if ( pte & _PAGE_PRESENT )
+ {
+ xen_pfn_t mfn, pfn;
+
+ pfn = pte_to_frame(ctx, pte);
+ mfn = pfn_to_mfn(ctx, pfn);
+
+ if ( mfn == INVALID_MFN )
+ {
+ if ( populate_pfn(ctx, pfn) )
+ return -1;
+
+ mfn = pfn_to_mfn(ctx, pfn);
+ }
+
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("Bad MFN for L%lu[%u]",
+ type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT, i);
+ pseudophysmap_walk(ctx, mfn);
+ errno = ERANGE;
+ return -1;
+ }
+
+ update_pte(ctx, &pte, mfn);
+
+ table[i] = pte;
+ }
+ }
+
+ return 0;
+}
+
+static int handle_end(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+
+ DPRINTF("End record");
+ return 0;
+}
+
+static int handle_page_data(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+ struct rec_page_data_header *page = rec->data;
+ xen_pfn_t mfn, pfn, type;
+ void *guest_page = NULL;
+ int rc = -1, err;
+
+ if ( rec->length < sizeof *page )
+ {
+ ERROR("PAGE_DATA record trucated: length %"PRIu32", min %zu",
+ rec->length, sizeof *page);
+ goto cleanup;
+ }
+ else if ( page->count != 1 )
+ {
+ // TODO
+ ERROR("Unable to handle batched pages (yet)");
+ goto cleanup;
+ }
+
+ pfn = page->pfn[0] & PAGE_DATA_PFN_MASK;
+ if ( pfn > ctx->x86_pv.max_pfn )
+ {
+ ERROR("pfn %#lx outside domain maximum (%#lx)", pfn, ctx->x86_pv.max_pfn);
+ goto cleanup;
+ }
+
+ type = (page->pfn[0] & PAGE_DATA_TYPE_MASK) >> 32;
+ if ( ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) >= 5) &&
+ ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) <= 8) )
+ {
+ ERROR("Invalid type %#lx for pfn %#lx", type, pfn);
+ goto cleanup;
+ }
+
+ ctx->x86_pv.pfn_types[pfn] = type;
+
+ switch ( type )
+ {
+ case XEN_DOMCTL_PFINFO_XTAB:
+ case XEN_DOMCTL_PFINFO_BROKEN:
+ /* No page data - leave alone */
+ rc = 0;
+ goto cleanup;
+ }
+
+ /* All other page types, need to allocate */
+ rc = populate_pfn(ctx, pfn);
+ if ( rc )
+ goto cleanup;
+
+ mfn = pfn_to_mfn(ctx, pfn);
+
+ guest_page = xc_map_foreign_bulk(
+ xch, ctx->domid, PROT_READ | PROT_WRITE, &mfn, &err, 1);
+ if ( !guest_page || err )
+ {
+ PERROR("Unable to map mfn %#lx (err %d)", mfn, err);
+ rc = -1;
+ goto cleanup;
+ }
+
+ /* XALLOC also has no page data */
+ if ( type != XEN_DOMCTL_PFINFO_XALLOC )
+ memcpy(guest_page, &page->pfn[1], PAGE_SIZE);
+
+ /* Pagetables need to be localised */
+ if ( ((type & XEN_DOMCTL_PFINFO_LTABTYPE_MASK) >= XEN_DOMCTL_PFINFO_L1TAB &&
+ (type & XEN_DOMCTL_PFINFO_LTABTYPE_MASK) <= XEN_DOMCTL_PFINFO_L4TAB) )
+ {
+ rc = localise_pagetable(ctx, guest_page, type);
+ if ( rc )
+ goto cleanup;
+ }
+
+ rc = 0;
+
+ cleanup:
+ if ( guest_page )
+ munmap(guest_page, PAGE_SIZE);
+
+ return rc;
+}
+
+static int handle_x86_pv_info(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+ struct rec_x86_pv_info *info = rec->data;
+
+ if ( rec->length < sizeof *info )
+ {
+ ERROR("X86_PV_INFO record trucated: length %"PRIu32", expected %zu",
+ rec->length, sizeof *info);
+ return -1;
+ }
+ else if ( info->guest_width != 4 &&
+ info->guest_width != 8 )
+ {
+ ERROR("Unexpected guest width %"PRIu32", Expected 4 or 8",
+ info->guest_width);
+ return -1;
+ }
+ else if ( info->guest_width != ctx->x86_pv.width )
+ {
+ int rc;
+ struct xen_domctl domctl;
+
+ /* try to set address size, domain is always created 64 bit */
+ memset(&domctl, 0, sizeof(domctl));
+ domctl.domain = ctx->domid;
+ domctl.cmd = XEN_DOMCTL_set_address_size;
+ domctl.u.address_size.size = info->guest_width * 8;
+ rc = do_domctl(xch, &domctl);
+ if ( rc != 0 )
+ {
+ ERROR("Width of guest in stream (%"PRIu32
+ " bits) differs with existing domain (%"PRIu32" bits)",
+ info->guest_width * 8, ctx->x86_pv.width * 8);
+ return -1;
+ }
+
+ /* domain informations changed, better to refresh */
+ rc = x86_pv_domain_info(ctx);
+ if ( rc != 0 )
+ {
+ ERROR("Unable to refresh guest informations");
+ return -1;
+ }
+ }
+ else if ( info->pt_levels != 3 &&
+ info->pt_levels != 4 )
+ {
+ ERROR("Unexpected guest levels %"PRIu32", Expected 3 or 4",
+ info->pt_levels);
+ return -1;
+ }
+ else if ( info->pt_levels != ctx->x86_pv.levels )
+ {
+ ERROR("Levels of guest in stream (%"PRIu32
+ ") differs with existing domain (%"PRIu32")",
+ info->pt_levels, ctx->x86_pv.levels);
+ return -1;
+ }
+
+ DPRINTF("X86_PV_INFO record: %d bits, %d levels",
+ ctx->x86_pv.width * 8, ctx->x86_pv.levels);
+ return 0;
+}
+
+static int handle_x86_pv_p2m_frames(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+ struct rec_x86_pv_p2m_frames *data = rec->data;
+ unsigned start, end, x;
+ int rc;
+
+ if ( rec->length < sizeof *data )
+ {
+ ERROR("X86_PV_P2M_FRAMES record trucated: length %"PRIu32", min %zu",
+ rec->length, sizeof *data + sizeof(uint64_t));
+ return -1;
+ }
+ else if ( data->start_pfn > data->end_pfn )
+ {
+ ERROR("End pfn in stream (%#"PRIx32") exceeds Start (%#"PRIx32")",
+ data->end_pfn, data->start_pfn);
+ return -1;
+ }
+
+ start = data->start_pfn / ctx->x86_pv.fpp;
+ end = data->end_pfn / ctx->x86_pv.fpp + 1;
+
+ if ( rec->length != sizeof *data + ((end - start) * sizeof (uint64_t)) )
+ {
+ ERROR("X86_PV_P2M_FRAMES record wrong size: start_pfn %#"PRIx32
+ ", end_pfn %#"PRIx32", length %"PRIu32
+ ", expected %zu + (%u - %u) * %zu",
+ data->start_pfn, data->end_pfn, rec->length,
+ sizeof *data, end, start, sizeof(uint64_t));
+ return -1;
+ }
+
+ if ( data->end_pfn > ctx->x86_pv.max_pfn )
+ {
+ rc = expand_p2m(ctx, data->end_pfn);
+ if ( rc )
+ return rc;
+ }
+
+ for ( x = 0; x <= (end - start); ++x )
+ ctx->x86_pv.p2m_pfns[start + x] = data->p2m_pfns[x];
+
+ DPRINTF("X86_PV_P2M_FRAMES record: GFNs %#"PRIx32"->%#"PRIx32,
+ data->start_pfn, data->end_pfn);
+ return 0;
+}
+
+static int handle_x86_pv_vcpu_basic(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+ struct rec_x86_pv_vcpu *vhdr = rec->data;
+ vcpu_guest_context_any_t vcpu;
+ size_t vcpusz = ctx->x86_pv.width == 8 ? sizeof vcpu.x64 : sizeof vcpu.x32;
+ xen_pfn_t pfn, mfn;
+ unsigned long tmp;
+ unsigned i;
+ int rc = -1;
+
+ if ( rec->length <= sizeof *vhdr )
+ {
+ ERROR("X86_PV_VCPU_BASIC record trucated: length %"PRIu32", min %zu",
+ rec->length, sizeof *vhdr + 1);
+ goto err;
+ }
+ else if ( rec->length != sizeof *vhdr + vcpusz )
+ {
+ ERROR("X86_PV_VCPU_EXTENDED record wrong size: length %"PRIu32
+ ", expected %zu", rec->length, sizeof *vhdr + vcpusz);
+ goto err;
+ }
+ else if ( vhdr->vcpu_id > ctx->dominfo.max_vcpu_id )
+ {
+ ERROR("X86_PV_VCPU_BASIC record vcpu_id (%"PRIu32
+ ") exceeds domain max (%u)",
+ vhdr->vcpu_id, ctx->dominfo.max_vcpu_id);
+ goto err;
+ }
+
+ memcpy(&vcpu, &vhdr->context, vcpusz);
+
+ SET_FIELD(ctx, &vcpu, flags, GET_FIELD(ctx, &vcpu, flags) | VGCF_online);
+
+ /* Vcpu 0 is special: Convert the suspend record to an MFN */
+ if ( vhdr->vcpu_id == 0 )
+ {
+ rc = process_start_info(ctx, &vcpu);
+ if ( rc )
+ return rc;
+ rc = -1;
+ }
+
+ tmp = GET_FIELD(ctx, &vcpu, gdt_ents);
+ if ( tmp > 8192 )
+ {
+ ERROR("GDT entry count (%lu) out of range", tmp);
+ errno = ERANGE;
+ goto err;
+ }
+
+ /* Convert GDT frames to MFNs */
+ for ( i = 0; (i * 512) < tmp; ++i )
+ {
+ pfn = GET_FIELD(ctx, &vcpu, gdt_frames[i]);
+ if ( pfn >= ctx->x86_pv.max_pfn )
+ {
+ ERROR("GDT frame %u (pfn %#lx) out of range", i, pfn);
+ goto err;
+ }
+ else if ( ctx->x86_pv.pfn_types[pfn] != XEN_DOMCTL_PFINFO_NOTAB )
+ {
+ ERROR("GDT frame %u (pfn %#lx) has bad type %lu", i, pfn,
+ ctx->x86_pv.pfn_types[pfn] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+ goto err;
+ }
+
+ mfn = pfn_to_mfn(ctx, pfn);
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("GDT frame %u has bad MFN", i);
+ pseudophysmap_walk(ctx, mfn);
+ goto err;
+ }
+
+ SET_FIELD(ctx, &vcpu, gdt_frames[i], mfn);
+ }
+
+ /* Convert CR3 to an MFN */
+ pfn = cr3_to_mfn(ctx, GET_FIELD(ctx, &vcpu, ctrlreg[3]));
+ if ( pfn >= ctx->x86_pv.max_pfn )
+ {
+ ERROR("cr3 (pfn %#lx) out of range", pfn);
+ goto err;
+ }
+ else if ( (ctx->x86_pv.pfn_types[pfn] & XEN_DOMCTL_PFINFO_LTABTYPE_MASK ) !=
+ (((xen_pfn_t)ctx->x86_pv.levels) << XEN_DOMCTL_PFINFO_LTAB_SHIFT) )
+ {
+ ERROR("cr3 (pfn %#lx) has bad type %lu, expected %lu", pfn,
+ ctx->x86_pv.pfn_types[pfn] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT,
+ ctx->x86_pv.levels);
+ goto err;
+ }
+
+ mfn = pfn_to_mfn(ctx, pfn);
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("cr3 has bad MFN");
+ pseudophysmap_walk(ctx, mfn);
+ goto err;
+ }
+
+ SET_FIELD(ctx, &vcpu, ctrlreg[3], mfn_to_cr3(ctx, mfn));
+
+ /* 64bit guests: Convert CR1 (guest pagetables) to MFN */
+ if ( ctx->x86_pv.levels == 4 && (vcpu.x64.ctrlreg[1] & 1) )
+ {
+ pfn = vcpu.x64.ctrlreg[1] >> PAGE_SHIFT;
+
+ if ( pfn >= ctx->x86_pv.max_pfn )
+ {
+ ERROR("cr1 (pfn %#lx) out of range", pfn);
+ goto err;
+ }
+ else if ( (ctx->x86_pv.pfn_types[pfn] & XEN_DOMCTL_PFINFO_LTABTYPE_MASK) !=
+ (((xen_pfn_t)ctx->x86_pv.levels) << XEN_DOMCTL_PFINFO_LTAB_SHIFT) )
+ {
+ ERROR("cr1 (pfn %#lx) has bad type %lu, expected %lu", pfn,
+ ctx->x86_pv.pfn_types[pfn] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT,
+ ctx->x86_pv.levels);
+ goto err;
+ }
+
+ mfn = pfn_to_mfn(ctx, pfn);
+ if ( !mfn_in_pseudophysmap(ctx, mfn) )
+ {
+ ERROR("cr1 has bad MFN");
+ pseudophysmap_walk(ctx, mfn);
+ goto err;
+ }
+
+ vcpu.x64.ctrlreg[1] = (uint64_t)mfn << PAGE_SHIFT;
+ }
+
+ if ( xc_vcpu_setcontext(xch, ctx->domid, vhdr->vcpu_id, &vcpu) )
+ {
+ PERROR("Failed to set vcpu%"PRIu32"'s basic info", vhdr->vcpu_id);
+ goto err;
+ }
+
+ rc = 0;
+ DPRINTF("vcpu%d X86_PV_VCPU_BASIC record", vhdr->vcpu_id);
+ err:
+ return rc;
+}
+
+static int handle_x86_pv_vcpu_extended(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+ struct rec_x86_pv_vcpu *vcpu = rec->data;
+ DECLARE_DOMCTL;
+
+ if ( rec->length <= sizeof *vcpu )
+ {
+ ERROR("X86_PV_VCPU_EXTENDED record trucated: length %"PRIu32", min %zu",
+ rec->length, sizeof *vcpu + 1);
+ return -1;
+ }
+ else if ( rec->length > sizeof *vcpu + 128 )
+ {
+ ERROR("X86_PV_VCPU_EXTENDED record too long: length %"PRIu32", max %zu",
+ rec->length, sizeof *vcpu + 128);
+ return -1;
+ }
+ else if ( vcpu->vcpu_id > ctx->dominfo.max_vcpu_id )
+ {
+ ERROR("X86_PV_VCPU_EXTENDED record vcpu_id (%"PRIu32
+ ") exceeds domain max (%u)",
+ vcpu->vcpu_id, ctx->dominfo.max_vcpu_id);
+ return -1;
+ }
+
+ domctl.cmd = XEN_DOMCTL_set_ext_vcpucontext;
+ domctl.domain = ctx->domid;
+ memcpy(&domctl.u.ext_vcpucontext, &vcpu->context, rec->length - sizeof *vcpu);
+
+ if ( xc_domctl(xch, &domctl) != 0 )
+ {
+ PERROR("Failed to set vcpu%"PRIu32"'s extended info", vcpu->vcpu_id);
+ return -1;
+ }
+
+ DPRINTF("vcpu%d X86_PV_VCPU_EXTENDED record", vcpu->vcpu_id);
+ return 0;
+}
+
+static int handle_x86_pv_vcpu_xsave(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+ struct rec_x86_pv_vcpu_xsave *vcpu = rec->data;
+ int rc;
+ DECLARE_DOMCTL;
+ DECLARE_HYPERCALL_BUFFER(void, buffer);
+ size_t buffersz;
+
+ if ( rec->length <= sizeof *vcpu )
+ {
+ ERROR("X86_PV_VCPU_XSAVE record trucated: length %"PRIu32", min %zu",
+ rec->length, sizeof *vcpu + 1);
+ return -1;
+ }
+ else if ( vcpu->vcpu_id > ctx->dominfo.max_vcpu_id )
+ {
+ ERROR("X86_PV_VCPU_EXTENDED record vcpu_id (%"PRIu32
+ ") exceeds domain max (%u)",
+ vcpu->vcpu_id, ctx->dominfo.max_vcpu_id);
+ return -1;
+ }
+
+ buffersz = rec->length - sizeof *vcpu;
+ buffer = xc_hypercall_buffer_alloc(xch, buffer, buffersz);
+ if ( !buffer )
+ {
+ ERROR("Unable to allocate %"PRIu64" bytes for xsave hypercall buffer",
+ buffersz);
+ return -1;
+ }
+
+ domctl.cmd = XEN_DOMCTL_setvcpuextstate;
+ domctl.domain = ctx->domid;
+ domctl.u.vcpuextstate.vcpu = vcpu->vcpu_id;
+ domctl.u.vcpuextstate.xfeature_mask = vcpu->xfeature_mask;
+ domctl.u.vcpuextstate.size = buffersz;
+ set_xen_guest_handle(domctl.u.vcpuextstate.buffer, buffer);
+
+ rc = xc_domctl(xch, &domctl);
+
+ xc_hypercall_buffer_free(xch, buffer);
+
+ if ( rc )
+ {
+ PERROR("Failed to set vcpu%"PRIu32"'s xsave info", vcpu->vcpu_id);
+ return rc;
+ }
+ else
+ {
+ DPRINTF("vcpu%d X86_PV_VCPU_XSAVE record", vcpu->vcpu_id);
+ return 0;
+ }
+}
+
+static int handle_x86_pv_shared_info(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+ unsigned i;
+ int rc = -1;
+ shared_info_any_t *guest_shared_info = NULL;
+ shared_info_any_t *stream_shared_info = rec->data;
+
+ if ( rec->length != PAGE_SIZE )
+ {
+ ERROR("X86_PV_SHARED_INFO record wrong size: length %"PRIu32
+ ", expected %u", rec->length, PAGE_SIZE);
+ goto err;
+ }
+
+ guest_shared_info = xc_map_foreign_range(
+ xch, ctx->domid, PAGE_SIZE, PROT_READ | PROT_WRITE,
+ ctx->dominfo.shared_info_frame);
+ if ( !guest_shared_info )
+ {
+ PERROR("Failed to map Shared Info at mfn %#lx",
+ ctx->dominfo.shared_info_frame);
+ goto err;
+ }
+
+ MEMCPY_FIELD(ctx, guest_shared_info, stream_shared_info, vcpu_info);
+ MEMCPY_FIELD(ctx, guest_shared_info, stream_shared_info, arch);
+
+ SET_FIELD(ctx, guest_shared_info, arch.pfn_to_mfn_frame_list_list, 0);
+
+ MEMSET_ARRAY_FIELD(ctx, guest_shared_info, evtchn_pending, 0);
+ for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ )
+ SET_FIELD(ctx, guest_shared_info, vcpu_info[i].evtchn_pending_sel, 0);
+
+ MEMSET_ARRAY_FIELD(ctx, guest_shared_info, evtchn_mask, 0xff);
+
+ rc = 0;
+ err:
+
+ if ( guest_shared_info )
+ munmap(guest_shared_info, PAGE_SIZE);
+
+ return rc;
+}
+static int handle_tsc_info(struct context *ctx, struct record *rec)
+{
+ xc_interface *xch = ctx->xch;
+ struct rec_tsc_info *tsc = rec->data;
+
+ if ( rec->length != sizeof *tsc )
+ {
+ ERROR("TSC_INFO record wrong size: length %"PRIu32", expected %zu",
+ rec->length, sizeof *tsc);
+ return -1;
+ }
+
+ if ( xc_domain_set_tsc_info(xch, ctx->domid, tsc->mode,
+ tsc->nsec, tsc->khz, tsc->incarnation) )
+ {
+ PERROR("Unable to set TSC information");
+ return -1;
+ }
+
+ return 0;
+}
+
+int restore_x86_pv(struct context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ struct record rec;
+ int rc;
+
+ IPRINTF("In experimental %s", __func__);
+
+ if ( ctx->restore.guest_type != DHDR_TYPE_x86_pv )
+ {
+ ERROR("Unable to restore %s domain into an x86_pv domain",
+ dhdr_type_to_str(ctx->restore.guest_type));
+ return -1;
+ }
+ else if ( ctx->restore.guest_page_size != 4096 )
+ {
+ ERROR("Invalid page size %d for x86_pv domains",
+ ctx->restore.guest_page_size);
+ return -1;
+ }
+
+ rc = x86_pv_domain_info(ctx);
+ if ( rc )
+ goto err;
+
+ rc = x86_pv_map_m2p(ctx);
+ if ( rc )
+ goto err;
+
+ do
+ {
+ rc = read_record(ctx, &rec);
+ if ( rc )
+ goto err;
+
+ switch ( rec.type )
+ {
+ case REC_TYPE_end:
+ rc = handle_end(ctx, &rec);
+ break;
+
+ case REC_TYPE_page_data:
+ rc = handle_page_data(ctx, &rec);
+ break;
+
+ case REC_TYPE_x86_pv_info:
+ rc = handle_x86_pv_info(ctx, &rec);
+ break;
+
+ case REC_TYPE_x86_pv_p2m_frames:
+ rc = handle_x86_pv_p2m_frames(ctx, &rec);
+ break;
+
+ case REC_TYPE_x86_pv_vcpu_basic:
+ rc = handle_x86_pv_vcpu_basic(ctx, &rec);
+ break;
+
+ case REC_TYPE_x86_pv_vcpu_extended:
+ rc = handle_x86_pv_vcpu_extended(ctx, &rec);
+ break;
+
+ case REC_TYPE_x86_pv_vcpu_xsave:
+ rc = handle_x86_pv_vcpu_xsave(ctx, &rec);
+ break;
+
+ case REC_TYPE_x86_pv_shared_info:
+ rc = handle_x86_pv_shared_info(ctx, &rec);
+ break;
+
+ case REC_TYPE_tsc_info:
+ rc = handle_tsc_info(ctx, &rec);
+ break;
+
+ default:
+ if ( rec.type & REC_TYPE_optional )
+ {
+ IPRINTF("Ignoring optional record (0x%"PRIx32", %s)",
+ rec.type, rec_type_to_str(rec.type));
+ rc = 0;
+ break;
+ }
+
+ ERROR("Invalid record type (0x%"PRIx32", %s) for x86_pv domains",
+ rec.type, rec_type_to_str(rec.type));
+ rc = -1;
+ break;
+ }
+
+ free(rec.data);
+ if ( rc )
+ goto err;
+
+ } while ( rec.type != REC_TYPE_end );
+
+ IPRINTF("Finished reading records");
+
+ rc = pin_pagetables(ctx);
+ if ( rc )
+ goto err;
+
+ rc = update_guest_p2m(ctx);
+ if ( rc )
+ goto err;
+
+ rc = xc_dom_gnttab_seed(xch, ctx->domid,
+ ctx->restore.console_mfn,
+ ctx->restore.xenstore_mfn,
+ ctx->restore.console_domid,
+ ctx->restore.xenstore_domid);
+ if ( rc )
+ {
+ PERROR("Failed to seed grant table");
+ goto err;
+ }
+
+ /* all done */
+ IPRINTF("All Done");
+ assert(!rc);
+ goto cleanup;
+
+ err:
+ assert(rc);
+ cleanup:
+
+ free(ctx->x86_pv.p2m_pfns);
+
+ if ( ctx->x86_pv.m2p )
+ munmap(ctx->x86_pv.m2p, ctx->x86_pv.nr_m2p_frames * PAGE_SIZE);
+
+ return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-09 18:28 [PATCH 0/6] [VERY RFC] Migration Stream v2 Andrew Cooper
` (5 preceding siblings ...)
2014-04-09 18:28 ` [PATCH 6/6] tools/libxc: x86 pv restore implementation Andrew Cooper
@ 2014-04-10 10:42 ` Ian Campbell
2014-04-10 11:21 ` Andrew Cooper
2014-04-23 13:47 ` Ian Campbell
7 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2014-04-10 10:42 UTC (permalink / raw)
To: Andrew Cooper
Cc: Keir Fraser, Tim Deegan, Ian Jackson, Xen-devel, Frediano Ziglio,
David Vrabel, Jan Beulich
On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
> Some design decisions have been take very deliberately (e.g. splitting the
> logic for PV and hvm migration) while others have been more along the lines of
> "I think its a sensible thing to do given a lack of any evidence/opinion to
> the contrary".
Is there some indication of which is which?
Should we check in the desigh/spec which was previously posted as part
of this?
> The error handling is known to only semi-consistent. Functions return 0 for
> success and non-zero for failure. This is typically -1, although errno is not
> always relevant. However, the logging messages should all be relevant and
> correct. Making this properly consistent will involve wider effort across all
> of libxc.
It would be useful if the new code was correct at least so far as its
own behaviour went (meaning no need to fix functions it calls as part of
this).
> An area needing discussing is how to do v1 -> v2 transformations for a one-time
> upgrade. There is a (very basic currently) python script which can pick a v1
> stream, and a separate python library to write v2 streams.
>
> One option would be to combine these two into a program which takes two fds,
> which libxc can exec() out to. There is deliberate flexibility in the v2
> restore code which allows a v1 -> v2 transformation on a stream without seeking.
forking/execing in libxc might be problematic, fitting it into libxl
might be easier, since it has infrastructure for that sort of thing.
Or maybe the fact that most of this already happens in a process which
libxl spawns for that purpose means that libxc can safely fork because
the application in that case is under our control.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-10 10:42 ` [PATCH 0/6] [VERY RFC] Migration Stream v2 Ian Campbell
@ 2014-04-10 11:21 ` Andrew Cooper
2014-04-10 13:05 ` Frediano Ziglio
2014-04-14 17:49 ` George Dunlap
0 siblings, 2 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-10 11:21 UTC (permalink / raw)
To: Ian Campbell
Cc: Keir Fraser, Tim Deegan, Ian Jackson, Xen-devel, Frediano Ziglio,
David Vrabel, Jan Beulich
On 10/04/14 11:42, Ian Campbell wrote:
> On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
>> Some design decisions have been take very deliberately (e.g. splitting the
>> logic for PV and hvm migration) while others have been more along the lines of
>> "I think its a sensible thing to do given a lack of any evidence/opinion to
>> the contrary".
> Is there some indication of which is which?
Not really, given the clean rewrite, and also that it is only partially
complete.
>
> Should we check in the desigh/spec which was previously posted as part
> of this?
I knew I forgot something...
http://xenbits.xen.org/people/andrewcoop/domain-save-format-E.pdf
>
>> The error handling is known to only semi-consistent. Functions return 0 for
>> success and non-zero for failure. This is typically -1, although errno is not
>> always relevant. However, the logging messages should all be relevant and
>> correct. Making this properly consistent will involve wider effort across all
>> of libxc.
> It would be useful if the new code was correct at least so far as its
> own behaviour went (meaning no need to fix functions it calls as part of
> this).
libxc is too broken for that to be possible, (including such gems as the
save_callbacks functions which is not specified as to how to indicate
success or error, and have developed at least 3 different flavours)
Currently, the state of play is "if you get non0, something went wrong.
Please read the log for relevant information" Once we get a libxc_err_t
(or so, given a discussion down the pub) capable of expressing more
meaningful error problems, most codepaths (including these new ones)
will need updating, although starting from a fairly-consistent position
will be much easier than not.
>
>> An area needing discussing is how to do v1 -> v2 transformations for a one-time
>> upgrade. There is a (very basic currently) python script which can pick a v1
>> stream, and a separate python library to write v2 streams.
>>
>> One option would be to combine these two into a program which takes two fds,
>> which libxc can exec() out to. There is deliberate flexibility in the v2
>> restore code which allows a v1 -> v2 transformation on a stream without seeking.
> forking/execing in libxc might be problematic, fitting it into libxl
> might be easier, since it has infrastructure for that sort of thing.
libxl is not the only user of libxc, and fixing it there would not help
the other consumers of libxc.
Furthermore, unless the consumer has an out-of-band detection method
(libxl can easily be made to have, Xapi less so easy, and no idea about
other consumers), xc_domain_restore() is the first piece of code capable
of detecting a legacy stream without needing to seek.
>
> Or maybe the fact that most of this already happens in a process which
> libxl spawns for that purpose means that libxc can safely fork because
> the application in that case is under our control.
Exactly the same for Xapi, which uses a separate process which functions
similarly to xc_save/restore but does domain build as well, which is why
I am hoping this is an acceptable way of fixing the problem.
~Andrew
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-10 11:21 ` Andrew Cooper
@ 2014-04-10 13:05 ` Frediano Ziglio
2014-04-10 13:49 ` Andrew Cooper
2014-04-14 17:49 ` George Dunlap
1 sibling, 1 reply; 22+ messages in thread
From: Frediano Ziglio @ 2014-04-10 13:05 UTC (permalink / raw)
To: Andrew Cooper
Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Xen-devel,
David Vrabel, Jan Beulich
On Thu, 2014-04-10 at 12:21 +0100, Andrew Cooper wrote:
> On 10/04/14 11:42, Ian Campbell wrote:
> > On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
....
>
> >
> >> The error handling is known to only semi-consistent. Functions return 0 for
> >> success and non-zero for failure. This is typically -1, although errno is not
> >> always relevant. However, the logging messages should all be relevant and
> >> correct. Making this properly consistent will involve wider effort across all
> >> of libxc.
> > It would be useful if the new code was correct at least so far as its
> > own behaviour went (meaning no need to fix functions it calls as part of
> > this).
>
> libxc is too broken for that to be possible, (including such gems as the
> save_callbacks functions which is not specified as to how to indicate
> success or error, and have developed at least 3 different flavours)
>
> Currently, the state of play is "if you get non0, something went wrong.
> Please read the log for relevant information" Once we get a libxc_err_t
> (or so, given a discussion down the pub) capable of expressing more
> meaningful error problems, most codepaths (including these new ones)
> will need updating, although starting from a fairly-consistent position
> will be much easier than not.
>
I agree with Ian, we should have a first patch that just replace
xc_domain_save/xc_domain_restore. We can fix functions return and error
later in another set of patches.
> >
> >> An area needing discussing is how to do v1 -> v2 transformations for a one-time
> >> upgrade. There is a (very basic currently) python script which can pick a v1
> >> stream, and a separate python library to write v2 streams.
> >>
> >> One option would be to combine these two into a program which takes two fds,
> >> which libxc can exec() out to. There is deliberate flexibility in the v2
> >> restore code which allows a v1 -> v2 transformation on a stream without seeking.
> > forking/execing in libxc might be problematic, fitting it into libxl
> > might be easier, since it has infrastructure for that sort of thing.
>
> libxl is not the only user of libxc, and fixing it there would not help
> the other consumers of libxc.
>
> Furthermore, unless the consumer has an out-of-band detection method
> (libxl can easily be made to have, Xapi less so easy, and no idea about
> other consumers), xc_domain_restore() is the first piece of code capable
> of detecting a legacy stream without needing to seek.
>
> >
> > Or maybe the fact that most of this already happens in a process which
> > libxl spawns for that purpose means that libxc can safely fork because
> > the application in that case is under our control.
>
> Exactly the same for Xapi, which uses a separate process which functions
> similarly to xc_save/restore but does domain build as well, which is why
> I am hoping this is an acceptable way of fixing the problem.
>
IMO we should just replace xc_domain_restore. xc_domain_restore is
executed in a the helper process which can fork easily a process if
needed. The idea is that xc_domain_restore read first bytes of the
stream (4 is enough) and if not valid it possibly fork and call python
code (or whatever) passing handle and bytes read and getting a new
handle with converted data. If you think probably we'll call fork before
any libxc function.
Another consideration about these patches should be file names and code
split thinking about ARM migration too. Too many functions are in x86
specific files. For instance xc_domain_restore2 (in restore.c) should
call a restore_arch_pv instead of a restore_x86_pv.
Frediano
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-10 13:05 ` Frediano Ziglio
@ 2014-04-10 13:49 ` Andrew Cooper
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-10 13:49 UTC (permalink / raw)
To: Frediano Ziglio
Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Xen-devel,
David Vrabel, Jan Beulich
On 10/04/14 14:05, Frediano Ziglio wrote:
> On Thu, 2014-04-10 at 12:21 +0100, Andrew Cooper wrote:
>> On 10/04/14 11:42, Ian Campbell wrote:
>>> On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
> ....
>>>> The error handling is known to only semi-consistent. Functions return 0 for
>>>> success and non-zero for failure. This is typically -1, although errno is not
>>>> always relevant. However, the logging messages should all be relevant and
>>>> correct. Making this properly consistent will involve wider effort across all
>>>> of libxc.
>>> It would be useful if the new code was correct at least so far as its
>>> own behaviour went (meaning no need to fix functions it calls as part of
>>> this).
>> libxc is too broken for that to be possible, (including such gems as the
>> save_callbacks functions which is not specified as to how to indicate
>> success or error, and have developed at least 3 different flavours)
>>
>> Currently, the state of play is "if you get non0, something went wrong.
>> Please read the log for relevant information" Once we get a libxc_err_t
>> (or so, given a discussion down the pub) capable of expressing more
>> meaningful error problems, most codepaths (including these new ones)
>> will need updating, although starting from a fairly-consistent position
>> will be much easier than not.
>>
> I agree with Ian, we should have a first patch that just replace
> xc_domain_save/xc_domain_restore. We can fix functions return and error
> later in another set of patches.
Which is surely agreeing with me... unless I am getting rather confused?
>
> Another consideration about these patches should be file names and code
> split thinking about ARM migration too. Too many functions are in x86
> specific files. For instance xc_domain_restore2 (in restore.c) should
> call a restore_arch_pv instead of a restore_x86_pv.
Very specifically not. xc_domain_restore2() currently contains
domain-agnostic restoration code. Yet-to-implement are
restore_x86_hvm() and restore_arm() which are expected to be in
restore_{x68_hvm,arm}.c. It is possible that some of the current helper
functions in restore_x86_pv.c should be prompted to common.
This is explicitly to undo the current rats nest of code in
xc_domain_{save,restore}(). I don't know why the PV and HVM migration
code was merged together in the past, but they have almost nothing in
common other than the format of the page batches (not even the content),
and wedging the code together has resulted in functions substantially
more complicated than the sum of their useful parts.
~Andrew
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-10 11:21 ` Andrew Cooper
2014-04-10 13:05 ` Frediano Ziglio
@ 2014-04-14 17:49 ` George Dunlap
2014-04-14 18:06 ` Andrew Cooper
2014-04-14 18:11 ` David Vrabel
1 sibling, 2 replies; 22+ messages in thread
From: George Dunlap @ 2014-04-14 17:49 UTC (permalink / raw)
To: Andrew Cooper
Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Xen-devel,
Frediano Ziglio, David Vrabel, Jan Beulich
On Thu, Apr 10, 2014 at 12:21 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 10/04/14 11:42, Ian Campbell wrote:
>> On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
>>> Some design decisions have been take very deliberately (e.g. splitting the
>>> logic for PV and hvm migration) while others have been more along the lines of
>>> "I think its a sensible thing to do given a lack of any evidence/opinion to
>>> the contrary".
>> Is there some indication of which is which?
>
> Not really, given the clean rewrite, and also that it is only partially
> complete.
>
>>
>> Should we check in the desigh/spec which was previously posted as part
>> of this?
>
> I knew I forgot something...
>
> http://xenbits.xen.org/people/andrewcoop/domain-save-format-E.pdf
What did you imagine might constitute an "Optional" record?
Other than that, everything looks sensible so far -- but having only
save/restore of one guest type is the easy bit. It's when you start
to have to multiplex across {PV, HVM, PVH} x {disk, network, remus}
that things are going to get more "interesting".
-George
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-14 17:49 ` George Dunlap
@ 2014-04-14 18:06 ` Andrew Cooper
2014-04-14 18:16 ` George Dunlap
2014-04-14 18:11 ` David Vrabel
1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2014-04-14 18:06 UTC (permalink / raw)
To: George Dunlap
Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Xen-devel,
Frediano Ziglio, David Vrabel, Jan Beulich
On 14/04/14 18:49, George Dunlap wrote:
> On Thu, Apr 10, 2014 at 12:21 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 10/04/14 11:42, Ian Campbell wrote:
>>> On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
>>>> Some design decisions have been take very deliberately (e.g. splitting the
>>>> logic for PV and hvm migration) while others have been more along the lines of
>>>> "I think its a sensible thing to do given a lack of any evidence/opinion to
>>>> the contrary".
>>> Is there some indication of which is which?
>> Not really, given the clean rewrite, and also that it is only partially
>> complete.
>>
>>> Should we check in the desigh/spec which was previously posted as part
>>> of this?
>> I knew I forgot something...
>>
>> http://xenbits.xen.org/people/andrewcoop/domain-save-format-E.pdf
>
> What did you imagine might constitute an "Optional" record?
>
> Other than that, everything looks sensible so far -- but having only
> save/restore of one guest type is the easy bit. It's when you start
> to have to multiplex across {PV, HVM, PVH} x {disk, network, remus}
> that things are going to get more "interesting".
>
> -George
I did not opt for optional records, nor did I author them into the
spec. Frankly, I cannot forsee a need for anything other than mandatory
records. Nothing required for migration can possibly be optional, and
anything else is likely to be toolstack data which necessarily has to be
ahead of the domain in the migration stream so the receiving toolstack
can create a suitable domain for the new xc_domain_restore() to restore
into.
As for multiplexing, I have been considering that given the
implementation of the live part of migration (which now works, as of a
few hours ago).
Domain save and domain restore are two very very different operations.
Save involves prodding at a live domain and stuffing values into a
stream. Restore involves taking stuff from a stream and updating paused
state.
A lot of the structure of save is common to all types of domain and
roughly follows:
send some headers
start logdirty
send some preamble state (p2m size, p2m etc)
while consulting dirty bitmap:
send some memory
pause domain
send remaining memory
send postamble state (vcpus, tsc, qemu blob etc)
send end record
I was considering writing a common save routine with domain/arch hooks.
This would allow for one canonical implementation of the save code
(including how to do the 'live' bit, or how to apply remus) while
separating the architecture bits.
Restore on the other hand is much more simple, and just involves reading
records from the stream in the order found and applying the required
changes to the specified domain. Each architecture can have a do {}
while() loop which accepts valid records for the specific type of domain.
I will experiment with the domain/arch hooks when implementing HVM
migration (which should be rather more simple)
~Andrew
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-14 17:49 ` George Dunlap
2014-04-14 18:06 ` Andrew Cooper
@ 2014-04-14 18:11 ` David Vrabel
2014-04-15 8:30 ` Frediano Ziglio
2014-04-15 10:35 ` Ian Jackson
1 sibling, 2 replies; 22+ messages in thread
From: David Vrabel @ 2014-04-14 18:11 UTC (permalink / raw)
To: George Dunlap
Cc: Tim Deegan, Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson,
Xen-devel, Frediano Ziglio, Jan Beulich
On 14/04/14 18:49, George Dunlap wrote:
> On Thu, Apr 10, 2014 at 12:21 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 10/04/14 11:42, Ian Campbell wrote:
>>> On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
>>>> Some design decisions have been take very deliberately (e.g. splitting the
>>>> logic for PV and hvm migration) while others have been more along the lines of
>>>> "I think its a sensible thing to do given a lack of any evidence/opinion to
>>>> the contrary".
>>> Is there some indication of which is which?
>>
>> Not really, given the clean rewrite, and also that it is only partially
>> complete.
>>
>>>
>>> Should we check in the desigh/spec which was previously posted as part
>>> of this?
>>
>> I knew I forgot something...
>>
>> http://xenbits.xen.org/people/andrewcoop/domain-save-format-E.pdf
>
>
> What did you imagine might constitute an "Optional" record?
This was something Ian Jackson asked for and it seems like a useful
capabilitity to have for future use. Not sure what it might be used for
yet.
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-14 18:06 ` Andrew Cooper
@ 2014-04-14 18:16 ` George Dunlap
2014-04-14 23:43 ` Andrew Cooper
0 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2014-04-14 18:16 UTC (permalink / raw)
To: Andrew Cooper
Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Xen-devel,
Frediano Ziglio, David Vrabel, Jan Beulich
On 04/14/2014 07:06 PM, Andrew Cooper wrote:
> On 14/04/14 18:49, George Dunlap wrote:
>> On Thu, Apr 10, 2014 at 12:21 PM, Andrew Cooper
>> <andrew.cooper3@citrix.com> wrote:
>>> On 10/04/14 11:42, Ian Campbell wrote:
>>>> On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
>>>>> Some design decisions have been take very deliberately (e.g. splitting the
>>>>> logic for PV and hvm migration) while others have been more along the lines of
>>>>> "I think its a sensible thing to do given a lack of any evidence/opinion to
>>>>> the contrary".
>>>> Is there some indication of which is which?
>>> Not really, given the clean rewrite, and also that it is only partially
>>> complete.
>>>
>>>> Should we check in the desigh/spec which was previously posted as part
>>>> of this?
>>> I knew I forgot something...
>>>
>>> http://xenbits.xen.org/people/andrewcoop/domain-save-format-E.pdf
>>
>> What did you imagine might constitute an "Optional" record?
>>
>
> I did not opt for optional records, nor did I author them into the
> spec.
So sometimes tone is hard to read in an e-mail; your tone here seems a
bit defensive, or at least rather strident; which seemed strange to me,
but when I looked back at what I wrote, I realized that it could be read
with a more sarcastic / biting tone than I intended.
So, I don't know if you read it that way, but if you did, sorry about
the misunderstanding; I was just being curious. :-)
And if you didn't mean your tone to be strident, or it was strident for
some other reason, nevermind. :-)
-George
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-14 18:16 ` George Dunlap
@ 2014-04-14 23:43 ` Andrew Cooper
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2014-04-14 23:43 UTC (permalink / raw)
To: George Dunlap
Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Xen-devel,
Frediano Ziglio, David Vrabel, Jan Beulich
On 14/04/2014 19:16, George Dunlap wrote:
> On 04/14/2014 07:06 PM, Andrew Cooper wrote:
>> On 14/04/14 18:49, George Dunlap wrote:
>>> On Thu, Apr 10, 2014 at 12:21 PM, Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>> On 10/04/14 11:42, Ian Campbell wrote:
>>>>> On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
>>>>>> Some design decisions have been take very deliberately (e.g.
>>>>>> splitting the
>>>>>> logic for PV and hvm migration) while others have been more along
>>>>>> the lines of
>>>>>> "I think its a sensible thing to do given a lack of any
>>>>>> evidence/opinion to
>>>>>> the contrary".
>>>>> Is there some indication of which is which?
>>>> Not really, given the clean rewrite, and also that it is only
>>>> partially
>>>> complete.
>>>>
>>>>> Should we check in the desigh/spec which was previously posted as
>>>>> part
>>>>> of this?
>>>> I knew I forgot something...
>>>>
>>>> http://xenbits.xen.org/people/andrewcoop/domain-save-format-E.pdf
>>>
>>> What did you imagine might constitute an "Optional" record?
>>>
>>
>> I did not opt for optional records, nor did I author them into the
>> spec.
>
> So sometimes tone is hard to read in an e-mail; your tone here seems a
> bit defensive, or at least rather strident; which seemed strange to
> me, but when I looked back at what I wrote, I realized that it could
> be read with a more sarcastic / biting tone than I intended.
>
> So, I don't know if you read it that way, but if you did, sorry about
> the misunderstanding; I was just being curious. :-)
>
> And if you didn't mean your tone to be strident, or it was strident
> for some other reason, nevermind. :-)
>
> -George
>
I honestly don't know for certain what my tone was intending to be. As
far as thoughts went, it was very much "I am writing an email, but
holding people up going to the pub, where I would also like to be".
Perhaps in retrospect the wording wasn't as good as it could have been.
As for optional records themselves; I don't have an objection to them
being in the spec, but I can't see any practical use for them. They do
no harm being specified yet unused (and I more important things to do
with my time than try to argue against them).
~Andrew
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-14 18:11 ` David Vrabel
@ 2014-04-15 8:30 ` Frediano Ziglio
2014-04-15 10:35 ` Ian Jackson
1 sibling, 0 replies; 22+ messages in thread
From: Frediano Ziglio @ 2014-04-15 8:30 UTC (permalink / raw)
To: David Vrabel
Cc: Keir Fraser, Ian Campbell, George Dunlap, Andrew Cooper,
Ian Jackson, Tim Deegan, Jan Beulich, Xen-devel
On Mon, 2014-04-14 at 19:11 +0100, David Vrabel wrote:
> On 14/04/14 18:49, George Dunlap wrote:
> > On Thu, Apr 10, 2014 at 12:21 PM, Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> On 10/04/14 11:42, Ian Campbell wrote:
> >>> On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
> >>>> Some design decisions have been take very deliberately (e.g. splitting the
> >>>> logic for PV and hvm migration) while others have been more along the lines of
> >>>> "I think its a sensible thing to do given a lack of any evidence/opinion to
> >>>> the contrary".
> >>> Is there some indication of which is which?
> >>
> >> Not really, given the clean rewrite, and also that it is only partially
> >> complete.
> >>
> >>>
> >>> Should we check in the desigh/spec which was previously posted as part
> >>> of this?
> >>
> >> I knew I forgot something...
> >>
> >> http://xenbits.xen.org/people/andrewcoop/domain-save-format-E.pdf
> >
> >
> > What did you imagine might constitute an "Optional" record?
>
> This was something Ian Jackson asked for and it seems like a useful
> capabilitity to have for future use. Not sure what it might be used for
> yet.
>
> David
Upper layers can stick stuff like machine descriptions (ie: "Web server
for selling") or other more technical (original host name/ip, specific
modifications of host like the "User agent" stuff in http, preferred
migration encapsulation).
The definition of optional is you can discard it and machine will work
the same.
Frediano
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-14 18:11 ` David Vrabel
2014-04-15 8:30 ` Frediano Ziglio
@ 2014-04-15 10:35 ` Ian Jackson
2014-04-15 10:38 ` George Dunlap
1 sibling, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2014-04-15 10:35 UTC (permalink / raw)
To: David Vrabel
Cc: Keir Fraser, Ian Campbell, George Dunlap, Andrew Cooper,
Tim Deegan, Xen-devel, Frediano Ziglio, Jan Beulich
David Vrabel writes ("Re: [Xen-devel] [PATCH 0/6] [VERY RFC] Migration Stream v2"):
> On 14/04/14 18:49, George Dunlap wrote:
> > What did you imagine might constitute an "Optional" record?
>
> This was something Ian Jackson asked for and it seems like a useful
> capabilitity to have for future use. Not sure what it might be used for
> yet.
Right.
Long experience with protocol design has taught me that protocols
should almost always have both an extensibility mechanism which is
ignored by ignorant receivers, and one which causes ignorant receivers
to abort.
I don't know yet what we might use it for. However, we should test
that it works (ie is ignored by) the receiver (or it will be useless).
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-15 10:35 ` Ian Jackson
@ 2014-04-15 10:38 ` George Dunlap
0 siblings, 0 replies; 22+ messages in thread
From: George Dunlap @ 2014-04-15 10:38 UTC (permalink / raw)
To: Ian Jackson, David Vrabel
Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan, Xen-devel,
Frediano Ziglio, Jan Beulich
On 04/15/2014 11:35 AM, Ian Jackson wrote:
> David Vrabel writes ("Re: [Xen-devel] [PATCH 0/6] [VERY RFC] Migration Stream v2"):
>> On 14/04/14 18:49, George Dunlap wrote:
>>> What did you imagine might constitute an "Optional" record?
>>
>> This was something Ian Jackson asked for and it seems like a useful
>> capabilitity to have for future use. Not sure what it might be used for
>> yet.
>
> Right.
>
> Long experience with protocol design has taught me that protocols
> should almost always have both an extensibility mechanism which is
> ignored by ignorant receivers, and one which causes ignorant receivers
> to abort.
>
> I don't know yet what we might use it for. However, we should test
> that it works (ie is ignored by) the receiver (or it will be useless).
Yes, this is the main concern. 2 billion record types should be plenty
for the "required" field, so the 2 billion allocated for "optional"
shouldn't be a big loss. :-) The main risk would be if something which
is, in fact, required for proper operation on the far side is marked
"optional". I guess as long as we have an "ignore everything optional"
test case we should be OK.
-George
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-09 18:28 [PATCH 0/6] [VERY RFC] Migration Stream v2 Andrew Cooper
` (6 preceding siblings ...)
2014-04-10 10:42 ` [PATCH 0/6] [VERY RFC] Migration Stream v2 Ian Campbell
@ 2014-04-23 13:47 ` Ian Campbell
2014-04-23 14:02 ` Andrew Cooper
7 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2014-04-23 13:47 UTC (permalink / raw)
To: Andrew Cooper
Cc: Keir Fraser, Tim Deegan, Ian Jackson, Xen-devel, Frediano Ziglio,
David Vrabel, Jan Beulich
On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
> Hello,
>
> Presented here for early review is a basic implementation of PV guest
> migration using the v2 stream format.
>
> PV non-live migration is believed-working; i.e. xl save/restore.
Based on comments I've seen elsewhere I think the state of the art has
moved on pretty significantly since this posting. Unless you'd like me
to I intend skip reviewing this iteration in favour of the next, is that
OK or would you still like comments on this one?
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-23 13:47 ` Ian Campbell
@ 2014-04-23 14:02 ` Andrew Cooper
2014-04-23 14:13 ` Ian Campbell
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2014-04-23 14:02 UTC (permalink / raw)
To: Ian Campbell
Cc: Keir Fraser, Tim Deegan, Ian Jackson, Xen-devel, Frediano Ziglio,
David Vrabel, Jan Beulich
On 23/04/14 14:47, Ian Campbell wrote:
> On Wed, 2014-04-09 at 19:28 +0100, Andrew Cooper wrote:
>> Hello,
>>
>> Presented here for early review is a basic implementation of PV guest
>> migration using the v2 stream format.
>>
>> PV non-live migration is believed-working; i.e. xl save/restore.
> Based on comments I've seen elsewhere I think the state of the art has
> moved on pretty significantly since this posting. Unless you'd like me
> to I intend skip reviewing this iteration in favour of the next, is that
> OK or would you still like comments on this one?
>
> Ian.
>
Feel free to skip the review of this series if you wish.
The current state of play is that PV migration is completely working.
I am fixing a batching issue. The first version of the code did the
really obvious, dumb, slow actions, for simplicity, and is now being
updated to batch mapping hypercalls for speed.
David is working on HVM migration which is substantially more simple
than PV.
I hope to have another series posted before the end of the week.
~Andrew
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/6] [VERY RFC] Migration Stream v2
2014-04-23 14:02 ` Andrew Cooper
@ 2014-04-23 14:13 ` Ian Campbell
0 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2014-04-23 14:13 UTC (permalink / raw)
To: Andrew Cooper
Cc: Keir Fraser, Tim Deegan, Ian Jackson, Xen-devel, Frediano Ziglio,
David Vrabel, Jan Beulich
On Wed, 2014-04-23 at 15:02 +0100, Andrew Cooper wrote:
> Feel free to skip the review of this series if you wish.
Done ;-)
> The current state of play is that PV migration is completely working.
>
> I am fixing a batching issue. The first version of the code did the
> really obvious, dumb, slow actions, for simplicity, and is now being
> updated to batch mapping hypercalls for speed.
>
> David is working on HVM migration which is substantially more simple
> than PV.
>
> I hope to have another series posted before the end of the week.
Excellent. </Burns>
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2014-04-23 14:13 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-09 18:28 [PATCH 0/6] [VERY RFC] Migration Stream v2 Andrew Cooper
2014-04-09 18:28 ` [PATCH 1/6] [HACK] tools/libxc: save/restore v2 framework Andrew Cooper
2014-04-09 18:28 ` [PATCH 2/6] tools/libxc: Stream specification and some common code Andrew Cooper
2014-04-09 18:28 ` [PATCH 3/6] tools/libxc: Scripts for inspection/valdiation of legacy and new streams Andrew Cooper
2014-04-09 18:28 ` [PATCH 4/6] tools/libxc: x86 pv common code Andrew Cooper
2014-04-09 18:28 ` [PATCH 5/6] tools/libxc: x86 pv save implementation Andrew Cooper
2014-04-09 18:28 ` [PATCH 6/6] tools/libxc: x86 pv restore implementation Andrew Cooper
2014-04-10 10:42 ` [PATCH 0/6] [VERY RFC] Migration Stream v2 Ian Campbell
2014-04-10 11:21 ` Andrew Cooper
2014-04-10 13:05 ` Frediano Ziglio
2014-04-10 13:49 ` Andrew Cooper
2014-04-14 17:49 ` George Dunlap
2014-04-14 18:06 ` Andrew Cooper
2014-04-14 18:16 ` George Dunlap
2014-04-14 23:43 ` Andrew Cooper
2014-04-14 18:11 ` David Vrabel
2014-04-15 8:30 ` Frediano Ziglio
2014-04-15 10:35 ` Ian Jackson
2014-04-15 10:38 ` George Dunlap
2014-04-23 13:47 ` Ian Campbell
2014-04-23 14:02 ` Andrew Cooper
2014-04-23 14:13 ` Ian Campbell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).