qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/35] postcopy live migration
@ 2012-10-30  8:32 Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 01/35] migration.c: remove redundant line in migrate_init() Isaku Yamahata
                   ` (37 more replies)
  0 siblings, 38 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This is the v3 patch series of postcopy migration.

The trees is available at
git://github.com/yamahata/qemu.git qemu-postcopy-oct-30-2012
git://github.com/yamahata/linux-umem.git linux-umem-oct-29-2012

Major changes v2 -> v3:
- implemented pre+post optimization
- auto detection of postcopy by incoming side
- using threads on destination instead of fork
- using blocking io instead of select + non-blocking io loop
- less memory overhead
- various improvement and code simplification
- kernel module name change umem -> uvmem to avoid name conflict.

Patches organization:
1-2: trivial fixes
3-5: prepartion for threading. cherry-picked from migration tree
6-18: refactoring existing code and preparation
19-25: implement postcopy live migration itself (essential part)
26-35: optimization/heuristic for postcopy

Usage
=====
You need load uvmem character device on the host before starting migration.
Postcopy can be used for tcg and kvm accelarator. The implementation depend
on only linux uvmem character device. But the driver dependent code is split
into a file.
I tested only host page size == guest page size case, but the implementation
allows host page size != guest page size case.

The following options are added with this patch series.
- incoming part
  use -incoming as usual. Postcopy is automatically detected.
  example:
  qemu -incoming tcp:0:4444 -monitor stdio -machine accel=kvm

- outging part
  options for migrate command
  migrate [-p [-n] [-m]] URI 
          [<precopy count> [<prefault forward> [<prefault backword>]]]

  Newly added options/arguments
  -p: indicate postcopy migration
  -n: disable background transferring pages: This is for benchmark/debugging
  -m: move background transfer of postcopy mode
  <precopy count>: The number of precopy RAM scan before postcopy.
                   default 0 (0 means no precopy)
  <prefault forward>: The number of forward pages which is sent with on-demand
  <prefault backward>: The number of backward pages which is sent with
                       on-demand

  example:
  migrate -p -n tcp:<dest ip address>:4444
  migrate -p -n -m tcp:<dest ip address>:4444 42 42 0


TODO
====
- benchmark/evaluation
- improve/optimization
  At the moment at least what I'm aware of is
  - pre+post case
    On desitnation side reading dirty bitmap would cause long latency.
    create thread for that.
- consider on FUSE/CUSE possibility

basic postcopy work flow
========================
        qemu on the destination
              |
              V
        open(/dev/uvmem)
              |
              V
        UVMEM_INIT
              |
              V
        Here we have two file descriptors to
        umem device and shmem file
              |
              |                                  umem threads
              |                                  on the destination
              |
              V    create pipe to communicate
        crete threads--------------------------------,
              |                                      |
              V                                   mmap(shmem file)
        mmap(uvmem device) for guest RAM          close(shmem file)
              |                                      |
              |                                      |
              V                                      |
        wait for ready from daemon <----pipe-----send ready message
              |                                      |
              |                                 Here the daemon takes over
        send ok------------pipe---------------> the owner of the socket
              |				        to the source
              V                                      |
        entering post copy stage                     |
        start guest execution                        |
              |                                      |
              V                                      V
        access guest RAM                          read() to get faulted pages
              |                                      |
              V                                      V
        page fault ------------------------------>page offset is returned
        block                                        |
                                                     V
                                                  pull page from the source
                                                  write the page contents
                                                  to the shmem.
                                                     |
                                                     V
        unblock     <-----------------------------write() to tell served pages
        the fault handler returns the page           |
        page fault is resolved                       |
              |                                      V
              |                                   touch guest RAM pages
              |                                      |
              |                                      V
              |                                   release the cached page
              |                                   madvise(MADV_REMOVE)
	      |
	      |
              |                                   pages can be sent
              |                                   backgroundly
              |                                      |
              |                                      V
              |                                   mark page is cached
              |                                   Thus future page fault is
              |                                   avoided.
              |                                      |
              |                                      V
              |                                   touch guest RAM pages
              |                                      |
              |                                      V
              |                                   release the cached page
              |                                   madvise(MADV_REMOVE)
              |                                      |
              V                                      V

                 all the pages are pulled from the source

              |                                      |
              V                                      V
        migration completes                        exit()


Isaku Yamahata (32):
  migration.c: remove redundant line in migrate_init()
  arch_init: DPRINTF format error and typo
  osdep: add qemu_read_full() to read interrupt-safely
  savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip,
    qemu_fflush
  savevm/QEMUFile: consolidate QEMUFile functions a bit
  savevm/QEMUFile: introduce qemu_fopen_fd
  savevm/QEMUFile: add read/write QEMUFile on memory buffer
  savevm, buffered_file: introduce method to drain buffer of buffered
    file
  arch_init: export RAM_SAVE_xxx flags for postcopy
  arch_init/ram_save: introduce constant for ram save version = 4
  arch_init: refactor ram_save_block() and export ram_save_block()
  arch_init/ram_save_setup: factor out bitmap alloc/free
  arch_init/ram_load: refactor ram_load
  arch_init: factor out logic to find ram block with id string
  migration: export migrate_fd_completed() and migrate_fd_cleanup()
  uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh
  osdep: add QEMU_MADV_REMOVE and tirivial fix
  postcopy: introduce helper functions for postcopy
  savevm: add new section that is used by postcopy
  postcopy: implement incoming part of postcopy live migration
  postcopy outgoing: add -p option to migrate command
  postcopy: implement outgoing part of postcopy live migration
  postcopy/outgoing: add -n options to disable background transfer
  postcopy/outgoing: implement forward/backword prefault
  arch_init: factor out setting last_block, last_offset
  postcopy/outgoing: add movebg mode(-m) to migration command
  arch_init: factor out ram_load
  arch_init: export ram_save_iterate()
  postcopy: pre+post optimization incoming side
  arch_init: export migration_bitmap_sync and helper method to get
    bitmap
  postcopy/outgoing: introduce precopy_count parameter
  postcopy: pre+post optimization outgoing side

Paolo Bonzini (1):
  split MRU ram list

Umesh Deshpande (2):
  add a version number to ram_list
  protect the ramlist with a separate mutex

 Makefile.target                 |    2 +
 arch_init.c                     |  391 +++++---
 arch_init.h                     |   24 +
 buffered_file.c                 |   59 +-
 buffered_file.h                 |    1 +
 cpu-all.h                       |   16 +-
 exec.c                          |   62 +-
 hmp-commands.hx                 |   21 +-
 hmp.c                           |   12 +-
 linux-headers/linux/uvmem.h     |   41 +
 migration-exec.c                |    8 +-
 migration-fd.c                  |   23 +-
 migration-postcopy.c            | 2019 +++++++++++++++++++++++++++++++++++++++
 migration-tcp.c                 |   16 +-
 migration-unix.c                |   36 +-
 migration.c                     |   65 +-
 migration.h                     |   42 +
 osdep.c                         |   24 +
 osdep.h                         |   13 +-
 qapi-schema.json                |    6 +-
 qemu-common.h                   |    2 +
 qemu-file.h                     |   12 +-
 qmp-commands.hx                 |    4 +-
 savevm.c                        |  223 ++++-
 scripts/update-linux-headers.sh |    2 +-
 sysemu.h                        |    2 +-
 umem.c                          |  291 ++++++
 umem.h                          |   88 ++
 vl.c                            |    5 +-
 29 files changed, 3265 insertions(+), 245 deletions(-)
 create mode 100644 linux-headers/linux/uvmem.h
 create mode 100644 migration-postcopy.c
 create mode 100644 umem.c
 create mode 100644 umem.h

--
1.7.10.4

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 01/35] migration.c: remove redundant line in migrate_init()
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 02/35] arch_init: DPRINTF format error and typo Isaku Yamahata
                   ` (36 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/migration.c b/migration.c
index 62e0304..8fcb466 100644
--- a/migration.c
+++ b/migration.c
@@ -460,7 +460,6 @@ static MigrationState *migrate_init(const MigrationParams *params)
            sizeof(enabled_capabilities));
     s->xbzrle_cache_size = xbzrle_cache_size;
 
-    s->bandwidth_limit = bandwidth_limit;
     s->state = MIG_STATE_SETUP;
     s->total_time = qemu_get_clock_ms(rt_clock);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 02/35] arch_init: DPRINTF format error and typo
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 01/35] migration.c: remove redundant line in migrate_init() Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 03/35] split MRU ram list Isaku Yamahata
                   ` (35 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

missing %
s/ram_save_live/ram_save_iterate/

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index e6effe8..79d4041 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -659,7 +659,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
     expected_downtime = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
-    DPRINTF("ram_save_live: expected(%" PRIu64 ") <= max(" PRIu64 ")?\n",
+    DPRINTF("ram_save_iterate: expected(%" PRIu64 ") <= max(%" PRIu64 ")?\n",
             expected_downtime, migrate_max_downtime());
 
     if (expected_downtime <= migrate_max_downtime()) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 03/35] split MRU ram list
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 01/35] migration.c: remove redundant line in migrate_init() Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 02/35] arch_init: DPRINTF format error and typo Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 04/35] add a version number to ram_list Isaku Yamahata
                   ` (34 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

From: Paolo Bonzini <pbonzini@redhat.com>

From: Paolo Bonzini <pbonzini@redhat.com>

Outside the execution threads the normal, non-MRU-ized order of
the RAM blocks should always be enough.  So manage two separate
lists, which will have separate locking rules.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch_init.c |    1 +
 cpu-all.h   |    4 +++-
 exec.c      |   18 +++++++++++++-----
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 79d4041..d6162af 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -48,6 +48,7 @@
 #include "qemu/page_cache.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "cpu-all.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
diff --git a/cpu-all.h b/cpu-all.h
index 6606432..ecbba12 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -490,8 +490,9 @@ typedef struct RAMBlock {
     ram_addr_t offset;
     ram_addr_t length;
     uint32_t flags;
-    char idstr[256];
+    QLIST_ENTRY(RAMBlock) next_mru;
     QLIST_ENTRY(RAMBlock) next;
+    char idstr[256];
 #if defined(__linux__) && !defined(TARGET_S390X)
     int fd;
 #endif
@@ -499,6 +500,7 @@ typedef struct RAMBlock {
 
 typedef struct RAMList {
     uint8_t *phys_dirty;
+    QLIST_HEAD(, RAMBlock) blocks_mru;
     QLIST_HEAD(, RAMBlock) blocks;
 } RAMList;
 extern RAMList ram_list;
diff --git a/exec.c b/exec.c
index b0ed593..489d924 100644
--- a/exec.c
+++ b/exec.c
@@ -56,6 +56,7 @@
 #include "xen-mapcache.h"
 #include "trace.h"
 #endif
+#include "cpu-all.h"
 
 #include "cputlb.h"
 
@@ -96,7 +97,10 @@ static uint8_t *code_gen_ptr;
 int phys_ram_fd;
 static int in_migration;
 
-RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks) };
+RAMList ram_list = {
+    .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks),
+    .blocks_mru = QLIST_HEAD_INITIALIZER(ram_list.blocks_mru)
+};
 
 static MemoryRegion *system_memory;
 static MemoryRegion *system_io;
@@ -641,6 +645,7 @@ bool tcg_enabled(void)
 void cpu_exec_init_all(void)
 {
 #if !defined(CONFIG_USER_ONLY)
+    qemu_mutex_init(&ram_list.mutex);
     memory_map_init();
     io_mem_init();
 #endif
@@ -2563,6 +2568,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     new_block->length = size;
 
     QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
+    QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
 
     ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
@@ -2591,6 +2597,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
+            QLIST_REMOVE(block, next_mru);
             g_free(block);
             return;
         }
@@ -2604,6 +2611,7 @@ void qemu_ram_free(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
+            QLIST_REMOVE(block, next_mru);
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
             } else if (mem_path) {
@@ -2709,12 +2717,12 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 {
     RAMBlock *block;
 
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
         if (addr - block->offset < block->length) {
             /* Move this entry to to start of the list.  */
             if (block != QLIST_FIRST(&ram_list.blocks)) {
-                QLIST_REMOVE(block, next);
-                QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
+                QLIST_REMOVE(block, next_mru);
+                QLIST_INSERT_HEAD(&ram_list.blocks_mru, block, next_mru);
             }
             if (xen_enabled()) {
                 /* We need to check if the requested address is in the RAM
@@ -2809,7 +2817,7 @@ int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
         return 0;
     }
 
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
         /* This case append when the block is not mapped. */
         if (block->host == NULL) {
             continue;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 04/35] add a version number to ram_list
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (2 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 03/35] split MRU ram list Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 05/35] protect the ramlist with a separate mutex Isaku Yamahata
                   ` (33 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, Umesh Deshpande, avi, pbonzini, chegu_vinod

From: Umesh Deshpande <udeshpan@redhat.com>

From: Umesh Deshpande <udeshpan@redhat.com>

This will be used to detect if last_block might have become invalid
across different calls to ram_save_live.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 arch_init.c |    7 ++++++-
 cpu-all.h   |    1 +
 exec.c      |    5 ++++-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d6162af..eb36a6a 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -336,6 +336,7 @@ static RAMBlock *last_block;
 static ram_addr_t last_offset;
 static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
+static uint32_t last_version;
 
 static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr,
                                                          ram_addr_t offset)
@@ -406,7 +407,6 @@ static void migration_bitmap_sync(void)
     }
 }
 
-
 /*
  * ram_save_block: Writes a page of memory to the stream f
  *
@@ -558,6 +558,7 @@ static void reset_ram_globals(void)
 {
     last_block = NULL;
     last_offset = 0;
+    last_version = ram_list.version;
     sort_ram_list();
 }
 
@@ -613,6 +614,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     uint64_t expected_downtime;
     MigrationState *s = migrate_get_current();
 
+    if (ram_list.version != last_version) {
+        reset_ram_globals();
+    }
+
     bytes_transferred_last = bytes_transferred;
     bwidth = qemu_get_clock_ns(rt_clock);
 
diff --git a/cpu-all.h b/cpu-all.h
index ecbba12..84aea8b 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -500,6 +500,7 @@ typedef struct RAMBlock {
 
 typedef struct RAMList {
     uint8_t *phys_dirty;
+    uint32_t version;
     QLIST_HEAD(, RAMBlock) blocks_mru;
     QLIST_HEAD(, RAMBlock) blocks;
 } RAMList;
diff --git a/exec.c b/exec.c
index 489d924..f5a8aca 100644
--- a/exec.c
+++ b/exec.c
@@ -645,7 +645,6 @@ bool tcg_enabled(void)
 void cpu_exec_init_all(void)
 {
 #if !defined(CONFIG_USER_ONLY)
-    qemu_mutex_init(&ram_list.mutex);
     memory_map_init();
     io_mem_init();
 #endif
@@ -2570,6 +2569,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
     QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
 
+    ram_list.version++;
+
     ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
     memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
@@ -2598,6 +2599,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
+            ram_list.version++;
             g_free(block);
             return;
         }
@@ -2612,6 +2614,7 @@ void qemu_ram_free(ram_addr_t addr)
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
+            ram_list.version++;
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
             } else if (mem_path) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 05/35] protect the ramlist with a separate mutex
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (3 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 04/35] add a version number to ram_list Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 06/35] osdep: add qemu_read_full() to read interrupt-safely Isaku Yamahata
                   ` (32 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, Umesh Deshpande, avi, pbonzini, chegu_vinod

From: Umesh Deshpande <udeshpan@redhat.com>

From: Umesh Deshpande <udeshpan@redhat.com>

Add the new mutex that protects shared state between ram_save_live
and the iothread.  If the iothread mutex has to be taken together
with the ramlist mutex, the iothread shall always be _outside_.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c |    9 ++++++++-
 cpu-all.h   |    8 ++++++++
 exec.c      |   23 +++++++++++++++++++++--
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index eb36a6a..a312434 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -553,7 +553,6 @@ static void ram_migration_cancel(void *opaque)
     migration_end();
 }
 
-
 static void reset_ram_globals(void)
 {
     last_block = NULL;
@@ -573,6 +572,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     bitmap_set(migration_bitmap, 1, ram_pages);
     migration_dirty_pages = ram_pages;
 
+    qemu_mutex_lock_ramlist();
     bytes_transferred = 0;
     reset_ram_globals();
 
@@ -600,6 +600,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
         qemu_put_be64(f, block->length);
     }
 
+    qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
     return 0;
@@ -614,6 +615,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     uint64_t expected_downtime;
     MigrationState *s = migrate_get_current();
 
+    qemu_mutex_lock_ramlist();
+
     if (ram_list.version != last_version) {
         reset_ram_globals();
     }
@@ -662,6 +665,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
         bwidth = 0.000001;
     }
 
+    qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
     expected_downtime = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
@@ -682,6 +686,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     migration_bitmap_sync();
 
+    qemu_mutex_lock_ramlist();
+
     /* try transferring iterative blocks of memory */
 
     /* flush all remaining blocks regardless of rate limiting */
@@ -697,6 +703,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     }
     memory_global_dirty_log_stop();
 
+    qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
     g_free(migration_bitmap);
diff --git a/cpu-all.h b/cpu-all.h
index 84aea8b..b5fefc8 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -22,6 +22,7 @@
 #include "qemu-common.h"
 #include "qemu-tls.h"
 #include "cpu-common.h"
+#include "qemu-thread.h"
 
 /* some important defines:
  *
@@ -490,7 +491,9 @@ typedef struct RAMBlock {
     ram_addr_t offset;
     ram_addr_t length;
     uint32_t flags;
+    /* Protected by the iothread lock.  */
     QLIST_ENTRY(RAMBlock) next_mru;
+    /* Protected by the ramlist lock.  */
     QLIST_ENTRY(RAMBlock) next;
     char idstr[256];
 #if defined(__linux__) && !defined(TARGET_S390X)
@@ -499,9 +502,12 @@ typedef struct RAMBlock {
 } RAMBlock;
 
 typedef struct RAMList {
+    QemuMutex mutex;
+    /* Protected by the iothread lock.  */
     uint8_t *phys_dirty;
     uint32_t version;
     QLIST_HEAD(, RAMBlock) blocks_mru;
+    /* Protected by the ramlist lock.  */
     QLIST_HEAD(, RAMBlock) blocks;
 } RAMList;
 extern RAMList ram_list;
@@ -521,6 +527,8 @@ extern int mem_prealloc;
 
 void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 ram_addr_t last_ram_offset(void);
+void qemu_mutex_lock_ramlist(void);
+void qemu_mutex_unlock_ramlist(void);
 #endif /* !CONFIG_USER_ONLY */
 
 int cpu_memory_rw_debug(CPUArchState *env, target_ulong addr,
diff --git a/exec.c b/exec.c
index f5a8aca..1414654 100644
--- a/exec.c
+++ b/exec.c
@@ -645,6 +645,7 @@ bool tcg_enabled(void)
 void cpu_exec_init_all(void)
 {
 #if !defined(CONFIG_USER_ONLY)
+    qemu_mutex_init(&ram_list.mutex);
     memory_map_init();
     io_mem_init();
 #endif
@@ -2324,6 +2325,16 @@ void qemu_flush_coalesced_mmio_buffer(void)
         kvm_flush_coalesced_mmio_buffer();
 }
 
+void qemu_mutex_lock_ramlist(void)
+{
+    qemu_mutex_lock(&ram_list.mutex);
+}
+
+void qemu_mutex_unlock_ramlist(void)
+{
+    qemu_mutex_unlock(&ram_list.mutex);
+}
+
 #if defined(__linux__) && !defined(TARGET_S390X)
 
 #include <sys/vfs.h>
@@ -2505,6 +2516,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
     }
     pstrcat(new_block->idstr, sizeof(new_block->idstr), name);
 
+    qemu_mutex_lock_ramlist();
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (block != new_block && !strcmp(block->idstr, new_block->idstr)) {
             fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
@@ -2512,6 +2524,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
             abort();
         }
     }
+    qemu_mutex_unlock_ramlist();
 }
 
 static int memory_try_enable_merging(void *addr, size_t len)
@@ -2535,6 +2548,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     size = TARGET_PAGE_ALIGN(size);
     new_block = g_malloc0(sizeof(*new_block));
 
+    qemu_mutex_lock_ramlist();
     new_block->mr = mr;
     new_block->offset = find_ram_offset(size);
     if (host) {
@@ -2570,6 +2584,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
 
     ram_list.version++;
+    qemu_mutex_unlock_ramlist();
 
     ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
@@ -2595,21 +2610,24 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
 {
     RAMBlock *block;
 
+    qemu_mutex_lock_ramlist();
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
             ram_list.version++;
             g_free(block);
-            return;
+            break;
         }
     }
+    qemu_mutex_unlock_ramlist();
 }
 
 void qemu_ram_free(ram_addr_t addr)
 {
     RAMBlock *block;
 
+    qemu_mutex_lock_ramlist();
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
@@ -2640,9 +2658,10 @@ void qemu_ram_free(ram_addr_t addr)
 #endif
             }
             g_free(block);
-            return;
+            break;
         }
     }
+    qemu_mutex_unlock_ramlist();
 
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 06/35] osdep: add qemu_read_full() to read interrupt-safely
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (4 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 05/35] protect the ramlist with a separate mutex Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 07/35] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip, qemu_fflush Isaku Yamahata
                   ` (31 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This is read counter part of qemu_write_full().

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 osdep.c       |   24 ++++++++++++++++++++++++
 qemu-common.h |    2 ++
 2 files changed, 26 insertions(+)

diff --git a/osdep.c b/osdep.c
index 3b25297..416ffe1 100644
--- a/osdep.c
+++ b/osdep.c
@@ -261,6 +261,30 @@ ssize_t qemu_write_full(int fd, const void *buf, size_t count)
     return total;
 }
 
+ssize_t qemu_read_full(int fd, void *buf, size_t count)
+{
+    ssize_t ret = 0;
+    ssize_t total = 0;
+
+    while (count) {
+        ret = read(fd, buf, count);
+        if (ret < 0) {
+            if (errno == EINTR)
+                continue;
+            break;
+        }
+        if (ret == 0) {
+            break;
+        }
+
+        count -= ret;
+        buf += ret;
+        total += ret;
+    }
+
+    return total;
+}
+
 /*
  * Opens a socket with FD_CLOEXEC set
  */
diff --git a/qemu-common.h b/qemu-common.h
index b54612b..16128c5 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -214,6 +214,8 @@ ssize_t qemu_write_full(int fd, const void *buf, size_t count)
     QEMU_WARN_UNUSED_RESULT;
 ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
     QEMU_WARN_UNUSED_RESULT;
+ssize_t qemu_read_full(int fd, void *buf, size_t count)
+    QEMU_WARN_UNUSED_RESULT;
 ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags)
     QEMU_WARN_UNUSED_RESULT;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 07/35] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip, qemu_fflush
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (5 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 06/35] osdep: add qemu_read_full() to read interrupt-safely Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 08/35] savevm/QEMUFile: consolidate QEMUFile functions a bit Isaku Yamahata
                   ` (30 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Those will be used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 qemu-file.h |    4 ++++
 savevm.c    |    8 ++++----
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 9c8985b..9b6dd08 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -72,6 +72,7 @@ QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
+int qemu_fflush(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
 
@@ -87,6 +88,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 int qemu_get_byte(QEMUFile *f);
+int qemu_peek_byte(QEMUFile *f, int offset);
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+void qemu_file_skip(QEMUFile *f, int size);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index b080d37..0c7af43 100644
--- a/savevm.c
+++ b/savevm.c
@@ -448,7 +448,7 @@ static void qemu_file_set_error(QEMUFile *f, int ret)
 /** Flushes QEMUFile buffer
  *
  */
-static int qemu_fflush(QEMUFile *f)
+int qemu_fflush(QEMUFile *f)
 {
     int ret = 0;
 
@@ -583,14 +583,14 @@ void qemu_put_byte(QEMUFile *f, int v)
     }
 }
 
-static void qemu_file_skip(QEMUFile *f, int size)
+void qemu_file_skip(QEMUFile *f, int size)
 {
     if (f->buf_index + size <= f->buf_size) {
         f->buf_index += size;
     }
 }
 
-static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
     int pending;
     int index;
@@ -638,7 +638,7 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
     return done;
 }
 
-static int qemu_peek_byte(QEMUFile *f, int offset)
+int qemu_peek_byte(QEMUFile *f, int offset)
 {
     int index = f->buf_index + offset;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 08/35] savevm/QEMUFile: consolidate QEMUFile functions a bit
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (6 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 07/35] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip, qemu_fflush Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 09/35] savevm/QEMUFile: introduce qemu_fopen_fd Isaku Yamahata
                   ` (29 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

- add qemu_file_fd() for later use
- drop qemu_stdio_fd
  Now qemu_file_fd() replaces qemu_stdio_fd().
- savevm/QEMUFileSocket: drop duplicated member fd
  fd is already stored in QEMUFile so drop duplicated member
   QEMUFileSocket::fd.
- remove QEMUFileSocket

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration-exec.c |    4 ++--
 migration-fd.c   |    2 +-
 qemu-file.h      |    2 +-
 savevm.c         |   40 +++++++++++++++++++---------------------
 4 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 6c97db9..95e9779 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -98,7 +98,7 @@ static void exec_accept_incoming_migration(void *opaque)
     QEMUFile *f = opaque;
 
     process_incoming_migration(f);
-    qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+    qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
     qemu_fclose(f);
 }
 
@@ -113,7 +113,7 @@ int exec_start_incoming_migration(const char *command)
         return -errno;
     }
 
-    qemu_set_fd_handler2(qemu_stdio_fd(f), NULL,
+    qemu_set_fd_handler2(qemu_file_fd(f), NULL,
 			 exec_accept_incoming_migration, NULL, f);
 
     return 0;
diff --git a/migration-fd.c b/migration-fd.c
index 7335167..b3c54e5 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -104,7 +104,7 @@ static void fd_accept_incoming_migration(void *opaque)
     QEMUFile *f = opaque;
 
     process_incoming_migration(f);
-    qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+    qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
     qemu_fclose(f);
 }
 
diff --git a/qemu-file.h b/qemu-file.h
index 9b6dd08..bc222dc 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -70,7 +70,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
-int qemu_stdio_fd(QEMUFile *f);
+int qemu_file_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int qemu_fflush(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
diff --git a/savevm.c b/savevm.c
index 0c7af43..e24041b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -178,6 +178,7 @@ struct QEMUFile {
     uint8_t buf[IO_BUF_SIZE];
 
     int last_error;
+    int fd;     /* -1 means fd isn't associated */
 };
 
 typedef struct QEMUFileStdio
@@ -186,19 +187,18 @@ typedef struct QEMUFileStdio
     QEMUFile *file;
 } QEMUFileStdio;
 
-typedef struct QEMUFileSocket
+typedef struct QEMUFileFD
 {
-    int fd;
     QEMUFile *file;
-} QEMUFileSocket;
+} QEMUFileFD;
 
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
-    QEMUFileSocket *s = opaque;
+    QEMUFileFD *s = opaque;
     ssize_t len;
 
     do {
-        len = qemu_recv(s->fd, buf, size, 0);
+        len = qemu_recv(s->file->fd, buf, size, 0);
     } while (len == -1 && socket_error() == EINTR);
 
     if (len == -1)
@@ -207,9 +207,9 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
     return len;
 }
 
-static int socket_close(void *opaque)
+static int fd_close(void *opaque)
 {
-    QEMUFileSocket *s = opaque;
+    QEMUFileFD *s = opaque;
     g_free(s);
     return 0;
 }
@@ -276,6 +276,7 @@ QEMUFile *qemu_popen(FILE *stdio_file, const char *mode)
         s->file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_pclose, 
 				 NULL, NULL, NULL);
     }
+    s->file->fd = fileno(stdio_file);
     return s->file;
 }
 
@@ -291,17 +292,6 @@ QEMUFile *qemu_popen_cmd(const char *command, const char *mode)
     return qemu_popen(popen_file, mode);
 }
 
-int qemu_stdio_fd(QEMUFile *f)
-{
-    QEMUFileStdio *p;
-    int fd;
-
-    p = (QEMUFileStdio *)f->opaque;
-    fd = fileno(p->stdio_file);
-
-    return fd;
-}
-
 QEMUFile *qemu_fdopen(int fd, const char *mode)
 {
     QEMUFileStdio *s;
@@ -325,6 +315,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
         s->file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_fclose, 
 				 NULL, NULL, NULL);
     }
+    s->file->fd = fd;
     return s->file;
 
 fail:
@@ -334,11 +325,11 @@ fail:
 
 QEMUFile *qemu_fopen_socket(int fd)
 {
-    QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket));
+    QEMUFileFD *s = g_malloc0(sizeof(QEMUFileFD));
 
-    s->fd = fd;
-    s->file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
+    s->file = qemu_fopen_ops(s, NULL, socket_get_buffer, fd_close,
 			     NULL, NULL, NULL);
+    s->file->fd = fd;
     return s->file;
 }
 
@@ -381,6 +372,7 @@ QEMUFile *qemu_fopen(const char *filename, const char *mode)
         s->file = qemu_fopen_ops(s, NULL, file_get_buffer, stdio_fclose, 
 			       NULL, NULL, NULL);
     }
+    s->file->fd = fileno(s->stdio_file);
     return s->file;
 fail:
     g_free(s);
@@ -431,10 +423,16 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer,
     f->set_rate_limit = set_rate_limit;
     f->get_rate_limit = get_rate_limit;
     f->is_write = 0;
+    f->fd = -1;
 
     return f;
 }
 
+int qemu_file_fd(QEMUFile *f)
+{
+    return f->fd;
+}
+
 int qemu_file_get_error(QEMUFile *f)
 {
     return f->last_error;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 09/35] savevm/QEMUFile: introduce qemu_fopen_fd
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (7 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 08/35] savevm/QEMUFile: consolidate QEMUFile functions a bit Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 10/35] savevm/QEMUFile: add read/write QEMUFile on memory buffer Isaku Yamahata
                   ` (28 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Introduce fd read/write backend of QEMUFile whose fd can be non-blocking
This will be used by postcopy live migration.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 qemu-file.h |    1 +
 savevm.c    |   35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/qemu-file.h b/qemu-file.h
index bc222dc..94557ea 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -68,6 +68,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_fd(int fd, const char *mode);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_file_fd(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index e24041b..712b7ae 100644
--- a/savevm.c
+++ b/savevm.c
@@ -207,6 +207,19 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
     return len;
 }
 
+static int fd_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+    QEMUFileFD *s = opaque;
+    return qemu_read_full(s->file->fd, buf, size);
+}
+
+static int fd_put_buffer(void *opaque,
+                         const uint8_t *buf, int64_t pos, int size)
+{
+    QEMUFileFD *s = opaque;
+    return qemu_write_full(s->file->fd, buf, size);
+}
+
 static int fd_close(void *opaque)
 {
     QEMUFileFD *s = opaque;
@@ -333,6 +346,28 @@ QEMUFile *qemu_fopen_socket(int fd)
     return s->file;
 }
 
+QEMUFile *qemu_fopen_fd(int fd, const char *mode)
+{
+    QEMUFileFD *s;
+
+    if (mode == NULL || (mode[0] != 'r' && mode[0] != 'w') || mode[1] != 0) {
+        fprintf(stderr, "qemu_fopen_fd: Argument validity check failed\n");
+        return NULL;
+    }
+
+    s = g_malloc0(sizeof(*s));
+    if (mode[0] == 'r') {
+        s->file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close,
+                                 NULL, NULL, NULL);
+    } else {
+        s->file = qemu_fopen_ops(s, fd_put_buffer, NULL, fd_close,
+                                 NULL, NULL, NULL);
+    }
+
+    s->file->fd = fd;
+    return s->file;
+}
+
 static int file_put_buffer(void *opaque, const uint8_t *buf,
                             int64_t pos, int size)
 {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 10/35] savevm/QEMUFile: add read/write QEMUFile on memory buffer
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (8 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 09/35] savevm/QEMUFile: introduce qemu_fopen_fd Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 11/35] savevm, buffered_file: introduce method to drain buffer of buffered file Isaku Yamahata
                   ` (27 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This will be used by postcopy/incoming part.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 qemu-file.h |    4 ++++
 savevm.c    |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/qemu-file.h b/qemu-file.h
index 94557ea..452efcd 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -71,6 +71,10 @@ QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_fopen_fd(int fd, const char *mode);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
+struct QEMUFileBuf;
+typedef struct QEMUFileBuf QEMUFileBuf;
+QEMUFileBuf *qemu_fopen_buf_write(void);
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size);
 int qemu_file_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int qemu_fflush(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index 712b7ae..7e55dce 100644
--- a/savevm.c
+++ b/savevm.c
@@ -368,6 +368,66 @@ QEMUFile *qemu_fopen_fd(int fd, const char *mode)
     return s->file;
 }
 
+struct QEMUFileBuf {
+    QEMUFile *file;
+    uint8_t *buffer;
+    size_t buffer_size;
+    size_t buffer_capacity;
+};
+
+static int buf_close(void *opaque)
+{
+    QEMUFileBuf *s = opaque;
+    g_free(s->buffer);
+    g_free(s);
+    return 0;
+}
+
+static int buf_put_buffer(void *opaque,
+                          const uint8_t *buf, int64_t pos, int size)
+{
+    QEMUFileBuf *s = opaque;
+
+    int inc = size - (s->buffer_capacity - s->buffer_size);
+    if (inc > 0) {
+        s->buffer_capacity += DIV_ROUND_UP(inc, IO_BUF_SIZE) * IO_BUF_SIZE;
+        s->buffer = g_realloc(s->buffer, s->buffer_capacity);
+    }
+    memcpy(s->buffer + s->buffer_size, buf, size);
+    s->buffer_size += size;
+
+    return size;
+}
+
+QEMUFileBuf *qemu_fopen_buf_write(void)
+{
+    QEMUFileBuf *s = g_malloc0(sizeof(*s));
+    s->file = qemu_fopen_ops(s,  buf_put_buffer, NULL, buf_close,
+                             NULL, NULL, NULL);
+    return s;
+}
+
+static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+    QEMUFileBuf *s = opaque;
+    ssize_t len = MIN(size, s->buffer_capacity - s->buffer_size);
+    memcpy(buf, s->buffer + s->buffer_size, len);
+    s->buffer_size += len;
+    return len;
+}
+
+/* This gets the ownership of buf. */
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size)
+{
+    QEMUFileBuf *s = g_malloc0(sizeof(*s));
+    s->buffer = buf;
+    s->buffer_size = 0; /* this is used as index to read */
+    s->buffer_capacity = size;
+    s->file = qemu_fopen_ops(s, NULL, buf_get_buffer, buf_close,
+                             NULL, NULL, NULL);
+    return s->file;
+}
+
 static int file_put_buffer(void *opaque, const uint8_t *buf,
                             int64_t pos, int size)
 {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 11/35] savevm, buffered_file: introduce method to drain buffer of buffered file
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (9 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 10/35] savevm/QEMUFile: add read/write QEMUFile on memory buffer Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 12/35] arch_init: export RAM_SAVE_xxx flags for postcopy Isaku Yamahata
                   ` (26 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Introduce a new method to drain the buffer of QEMUBufferedFile.
When postcopy migration, buffer size can increase unboundedly.
To keep the buffer size reasonably small, introduce the method to
wait for buffer to drain.
Detect unfreeze output by select too, not only by timer, thus pending data
can be sent quickly.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 buffered_file.c |   59 +++++++++++++++++++++++++++++++++++++++++++++----------
 buffered_file.h |    1 +
 qemu-file.h     |    1 +
 savevm.c        |    7 +++++++
 4 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index ed92df1..275d504 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -26,12 +26,14 @@ typedef struct QEMUFileBuffered
     MigrationState *migration_state;
     QEMUFile *file;
     int freeze_output;
+    bool no_limit;
     size_t bytes_xfer;
     size_t xfer_limit;
     uint8_t *buffer;
     size_t buffer_size;
     size_t buffer_capacity;
     QEMUTimer *timer;
+    int unfreeze_fd;
 } QEMUFileBuffered;
 
 #ifdef DEBUG_BUFFERED_FILE
@@ -42,6 +44,16 @@ typedef struct QEMUFileBuffered
     do { } while (0)
 #endif
 
+static ssize_t buffered_flush(QEMUFileBuffered *s);
+
+static void buffered_unfreeze(void *opaque)
+{
+    QEMUFileBuffered *s = opaque;
+    qemu_set_fd_handler(s->unfreeze_fd, NULL, NULL, NULL);
+    s->freeze_output = 0;
+    buffered_flush(s);
+}
+
 static void buffered_append(QEMUFileBuffered *s,
                             const uint8_t *buf, size_t size)
 {
@@ -65,7 +77,8 @@ static ssize_t buffered_flush(QEMUFileBuffered *s)
 
     DPRINTF("flushing %zu byte(s) of data\n", s->buffer_size);
 
-    while (s->bytes_xfer < s->xfer_limit && offset < s->buffer_size) {
+    while ((s->bytes_xfer < s->xfer_limit && offset < s->buffer_size) ||
+           s->no_limit) {
 
         ret = migrate_fd_put_buffer(s->migration_state, s->buffer + offset,
                                     s->buffer_size - offset);
@@ -73,6 +86,15 @@ static ssize_t buffered_flush(QEMUFileBuffered *s)
             DPRINTF("backend not ready, freezing\n");
             ret = 0;
             s->freeze_output = 1;
+            if (!s->no_limit) {
+                if (s->unfreeze_fd == -1) {
+                    s->unfreeze_fd = dup(s->migration_state->fd);
+                }
+                if (s->unfreeze_fd >= 0) {
+                    qemu_set_fd_handler(s->unfreeze_fd,
+                                        NULL, buffered_unfreeze, s);
+                }
+            }
             break;
         }
 
@@ -113,7 +135,7 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
     s->freeze_output = 0;
 
     if (size > 0) {
-        DPRINTF("buffering %d bytes\n", size - offset);
+        DPRINTF("buffering %d bytes\n", size);
         buffered_append(s, buf, size);
     }
 
@@ -134,17 +156,11 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
     return size;
 }
 
-static int buffered_close(void *opaque)
+static void buffered_drain(QEMUFileBuffered *s)
 {
-    QEMUFileBuffered *s = opaque;
-    ssize_t ret = 0;
-    int ret2;
-
-    DPRINTF("closing\n");
-
     s->xfer_limit = INT_MAX;
     while (!qemu_file_get_error(s->file) && s->buffer_size) {
-        ret = buffered_flush(s);
+        ssize_t ret = buffered_flush(s);
         if (ret < 0) {
             break;
         }
@@ -153,13 +169,27 @@ static int buffered_close(void *opaque)
             if (ret < 0) {
                 break;
             }
+            s->freeze_output = 0;
         }
     }
+}
+
+static int buffered_close(void *opaque)
+{
+    QEMUFileBuffered *s = opaque;
+    ssize_t ret = 0;
+    int ret2;
 
+    DPRINTF("closing\n");
+
+    buffered_drain(s);
     ret2 = migrate_fd_close(s->migration_state);
     if (ret >= 0) {
         ret = ret2;
     }
+    if (s->unfreeze_fd >= 0) {
+        close(s->unfreeze_fd);
+    }
     qemu_del_timer(s->timer);
     qemu_free_timer(s->timer);
     g_free(s->buffer);
@@ -242,6 +272,7 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state)
 
     s->migration_state = migration_state;
     s->xfer_limit = migration_state->bandwidth_limit / 10;
+    s->unfreeze_fd = -1;
 
     s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
                              buffered_close, buffered_rate_limit,
@@ -254,3 +285,11 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state)
 
     return s->file;
 }
+
+void qemu_buffered_file_drain_buffer(void *buffered_file)
+{
+    QEMUFileBuffered *s = buffered_file;
+    s->no_limit = true;
+    buffered_drain(s);
+    s->no_limit = false;
+}
diff --git a/buffered_file.h b/buffered_file.h
index ef010fe..be714a7 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -18,5 +18,6 @@
 #include "migration.h"
 
 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state);
+void qemu_buffered_file_drain_buffer(void *buffered_file);
 
 #endif
diff --git a/qemu-file.h b/qemu-file.h
index 452efcd..8074df1 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -76,6 +76,7 @@ typedef struct QEMUFileBuf QEMUFileBuf;
 QEMUFileBuf *qemu_fopen_buf_write(void);
 QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size);
 int qemu_file_fd(QEMUFile *f);
+void qemu_buffered_file_drain(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int qemu_fflush(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
diff --git a/savevm.c b/savevm.c
index 7e55dce..93c51ab 100644
--- a/savevm.c
+++ b/savevm.c
@@ -86,6 +86,7 @@
 #include "memory.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "buffered_file.h"
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -558,6 +559,12 @@ int qemu_fflush(QEMUFile *f)
     return ret;
 }
 
+void qemu_buffered_file_drain(QEMUFile *f)
+{
+    qemu_fflush(f);
+    qemu_buffered_file_drain_buffer(f->opaque);
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
     int len;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 12/35] arch_init: export RAM_SAVE_xxx flags for postcopy
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (10 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 11/35] savevm, buffered_file: introduce method to drain buffer of buffered file Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 13/35] arch_init/ram_save: introduce constant for ram save version = 4 Isaku Yamahata
                   ` (25 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Those constants will be also used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |    8 --------
 arch_init.h |    8 ++++++++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a312434..4b65221 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -106,14 +106,6 @@ const uint32_t arch_type = QEMU_ARCH;
 /***********************************************************/
 /* ram save/restore */
 
-#define RAM_SAVE_FLAG_FULL     0x01 /* Obsolete, not used anymore */
-#define RAM_SAVE_FLAG_COMPRESS 0x02
-#define RAM_SAVE_FLAG_MEM_SIZE 0x04
-#define RAM_SAVE_FLAG_PAGE     0x08
-#define RAM_SAVE_FLAG_EOS      0x10
-#define RAM_SAVE_FLAG_CONTINUE 0x20
-#define RAM_SAVE_FLAG_XBZRLE   0x40
-
 #ifdef __ALTIVEC__
 #include <altivec.h>
 #define VECTYPE        vector unsigned char
diff --git a/arch_init.h b/arch_init.h
index d9c572a..e4c131e 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -36,4 +36,12 @@ int xen_available(void);
 
 CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp);
 
+#define RAM_SAVE_FLAG_FULL     0x01 /* Obsolete, not used anymore */
+#define RAM_SAVE_FLAG_COMPRESS 0x02
+#define RAM_SAVE_FLAG_MEM_SIZE 0x04
+#define RAM_SAVE_FLAG_PAGE     0x08
+#define RAM_SAVE_FLAG_EOS      0x10
+#define RAM_SAVE_FLAG_CONTINUE 0x20
+#define RAM_SAVE_FLAG_XBZRLE   0x40
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 13/35] arch_init/ram_save: introduce constant for ram save version = 4
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (11 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 12/35] arch_init: export RAM_SAVE_xxx flags for postcopy Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 14/35] arch_init: refactor ram_save_block() and export ram_save_block() Isaku Yamahata
                   ` (24 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Introduce RAM_SAVE_VERSION_ID to represent version_id for ram save format.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |    2 +-
 arch_init.h |    2 ++
 vl.c        |    3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 4b65221..23717d3 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -784,7 +784,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 
     seq_iter++;
 
-    if (version_id < 4 || version_id > 4) {
+    if (version_id < 4 || version_id > RAM_SAVE_VERSION_ID) {
         return -EINVAL;
     }
 
diff --git a/arch_init.h b/arch_init.h
index e4c131e..780eedf 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -44,4 +44,6 @@ CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp);
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 #define RAM_SAVE_FLAG_XBZRLE   0x40
 
+#define RAM_SAVE_VERSION_ID     4 /* currently version 4 */
+
 #endif
diff --git a/vl.c b/vl.c
index ee3c43a..723fc59 100644
--- a/vl.c
+++ b/vl.c
@@ -3557,7 +3557,8 @@ int main(int argc, char **argv, char **envp)
     default_drive(default_sdcard, snapshot, machine->use_scsi,
                   IF_SD, 0, SD_OPTS);
 
-    register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers, NULL);
+    register_savevm_live(NULL, "ram", 0, RAM_SAVE_VERSION_ID,
+                         &savevm_ram_handlers, NULL);
 
     if (nb_numa_nodes > 0) {
         int i;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 14/35] arch_init: refactor ram_save_block() and export ram_save_block()
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (12 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 13/35] arch_init/ram_save: introduce constant for ram save version = 4 Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 15/35] arch_init/ram_save_setup: factor out bitmap alloc/free Isaku Yamahata
                   ` (23 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

arch_init: factor out counting transferred bytes.
This will be used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- manual rebase
- report ram_save_block

Chnages v1 -> v2:
- don't refer last_block which can be NULL.
  And avoid possible infinite loop.
---
 arch_init.c |  122 +++++++++++++++++++++++++++++++----------------------------
 arch_init.h |    5 +++
 migration.h |    1 +
 3 files changed, 70 insertions(+), 58 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 23717d3..ad1b01b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -399,59 +399,77 @@ static void migration_bitmap_sync(void)
     }
 }
 
+static uint64_t bytes_transferred;
+
+/*
+ * ram_save_page: Writes a page of memory to the stream f
+ *
+ * Returns:  true:  page written
+ *           false: no page written
+ */
+static const RAMBlock *last_sent_block = NULL;
+bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
+                   bool last_stage)
+{
+    MemoryRegion *mr = block->mr;
+    uint8_t *p;
+    int cont;
+    int bytes_sent = -1;
+    ram_addr_t current_addr;
+
+    if (!migration_bitmap_test_and_reset_dirty(mr, offset)) {
+        return false;
+    }
+
+    cont = (block == last_sent_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
+    last_sent_block = block;
+    p = memory_region_get_ram_ptr(mr) + offset;
+    if (is_dup_page(p)) {
+        acct_info.dup_pages++;
+        save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
+        qemu_put_byte(f, *p);
+        bytes_sent = 1;
+    } else if (migrate_use_xbzrle()) {
+        current_addr = block->offset + offset;
+        bytes_sent = save_xbzrle_page(f, p, current_addr, block,
+                                      offset, cont, last_stage);
+        if (!last_stage) {
+            p = get_cached_data(XBZRLE.cache, current_addr);
+        }
+    }
+
+    /* either we didn't send yet (we may have had XBZRLE overflow) */
+    if (bytes_sent == -1) {
+        save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
+        qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+        bytes_sent = TARGET_PAGE_SIZE;
+        acct_info.norm_pages++;
+    }
+
+    bytes_transferred += bytes_sent;
+    return true;
+}
+
 /*
  * ram_save_block: Writes a page of memory to the stream f
  *
- * Returns:  0: if the page hasn't changed
- *          -1: if there are no more dirty pages
- *           n: the amount of bytes written in other case
+ * Returns: true:  there may be more dirty pages
+ *          false: if there are no more dirty pages
  */
 
-static int ram_save_block(QEMUFile *f, bool last_stage)
+bool ram_save_block(QEMUFile *f, bool last_stage)
 {
     RAMBlock *block = last_block;
     ram_addr_t offset = last_offset;
-    int bytes_sent = -1;
-    MemoryRegion *mr;
-    ram_addr_t current_addr;
+    bool wrote = false;
 
     if (!block)
         block = QLIST_FIRST(&ram_list.blocks);
 
     do {
-        mr = block->mr;
-        if (migration_bitmap_test_and_reset_dirty(mr, offset)) {
-            uint8_t *p;
-            int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
-
-            p = memory_region_get_ram_ptr(mr) + offset;
-
-            if (is_dup_page(p)) {
-                acct_info.dup_pages++;
-                save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
-                qemu_put_byte(f, *p);
-                bytes_sent = 1;
-            } else if (migrate_use_xbzrle()) {
-                current_addr = block->offset + offset;
-                bytes_sent = save_xbzrle_page(f, p, current_addr, block,
-                                              offset, cont, last_stage);
-                if (!last_stage) {
-                    p = get_cached_data(XBZRLE.cache, current_addr);
-                }
-            }
-
-            /* either we didn't send yet (we may have had XBZRLE overflow) */
-            if (bytes_sent == -1) {
-                save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
-                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
-                bytes_sent = TARGET_PAGE_SIZE;
-                acct_info.norm_pages++;
-            }
-
-            /* if page is unmodified, continue to the next */
-            if (bytes_sent != 0) {
-                break;
-            }
+        wrote = ram_save_page(f, block, offset, last_stage);
+        if (wrote) {
+            break;
         }
 
         offset += TARGET_PAGE_SIZE;
@@ -466,11 +484,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
     last_block = block;
     last_offset = offset;
 
-    return bytes_sent;
+    return wrote;
 }
 
-static uint64_t bytes_transferred;
-
 static ram_addr_t ram_save_remaining(void)
 {
     return migration_dirty_pages;
@@ -547,6 +563,7 @@ static void ram_migration_cancel(void *opaque)
 
 static void reset_ram_globals(void)
 {
+    last_sent_block = NULL;
     last_block = NULL;
     last_offset = 0;
     last_version = ram_list.version;
@@ -618,14 +635,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 
     i = 0;
     while ((ret = qemu_file_rate_limit(f)) == 0) {
-        int bytes_sent;
-
-        bytes_sent = ram_save_block(f, false);
-        /* no more blocks to sent */
-        if (bytes_sent < 0) {
+        if (!ram_save_block(f, false)) {
+            /* no more blocks to sent */
             break;
         }
-        bytes_transferred += bytes_sent;
         acct_info.iterations++;
         /* we want to check in the 1st loop, just in case it was the 1st time
            and we had to sync the dirty bitmap.
@@ -683,15 +696,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     /* try transferring iterative blocks of memory */
 
     /* flush all remaining blocks regardless of rate limiting */
-    while (true) {
-        int bytes_sent;
-
-        bytes_sent = ram_save_block(f, true);
-        /* no more blocks to sent */
-        if (bytes_sent < 0) {
-            break;
-        }
-        bytes_transferred += bytes_sent;
+    while (!ram_save_block(f, true)) {
+        /* nothing */
     }
     memory_global_dirty_log_stop();
 
diff --git a/arch_init.h b/arch_init.h
index 780eedf..f2a7ae5 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -46,4 +46,9 @@ CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp);
 
 #define RAM_SAVE_VERSION_ID     4 /* currently version 4 */
 
+#if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
+bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
+                   bool last_stage);
+#endif
+
 #endif
diff --git a/migration.h b/migration.h
index 1c3e9b7..7d1b62d 100644
--- a/migration.h
+++ b/migration.h
@@ -91,6 +91,7 @@ bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
 MigrationState *migrate_get_current(void);
 
+bool ram_save_block(QEMUFile *f, bool last_stage);
 uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 15/35] arch_init/ram_save_setup: factor out bitmap alloc/free
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (13 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 14/35] arch_init: refactor ram_save_block() and export ram_save_block() Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 16/35] arch_init/ram_load: refactor ram_load Isaku Yamahata
                   ` (22 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This will be used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- new
---
 arch_init.c |   25 ++++++++++++++++++-------
 migration.h |    2 ++
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index ad1b01b..7e6d84e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -330,6 +330,22 @@ static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
 static uint32_t last_version;
 
+void migration_bitmap_init(void)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+    if (!migration_bitmap) {
+        migration_bitmap = bitmap_new(ram_pages);
+    }
+    bitmap_set(migration_bitmap, 1, ram_pages);
+    migration_dirty_pages = ram_pages;
+}
+
+void migration_bitmap_free(void)
+{
+    g_free(migration_bitmap);
+    migration_bitmap = NULL;
+}
+
 static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr,
                                                          ram_addr_t offset)
 {
@@ -575,11 +591,7 @@ static void reset_ram_globals(void)
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
-    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
-
-    migration_bitmap = bitmap_new(ram_pages);
-    bitmap_set(migration_bitmap, 1, ram_pages);
-    migration_dirty_pages = ram_pages;
+    migration_bitmap_init();
 
     qemu_mutex_lock_ramlist();
     bytes_transferred = 0;
@@ -704,8 +716,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
-    g_free(migration_bitmap);
-    migration_bitmap = NULL;
+    migration_bitmap_free();
 
     return 0;
 }
diff --git a/migration.h b/migration.h
index 7d1b62d..73416ba 100644
--- a/migration.h
+++ b/migration.h
@@ -95,6 +95,8 @@ bool ram_save_block(QEMUFile *f, bool last_stage);
 uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
+void migration_bitmap_init(void);
+void migration_bitmap_free(void);
 
 extern SaveVMHandlers savevm_ram_handlers;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 16/35] arch_init/ram_load: refactor ram_load
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (14 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 15/35] arch_init/ram_save_setup: factor out bitmap alloc/free Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 17/35] arch_init: factor out logic to find ram block with id string Isaku Yamahata
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

ram_load_page() will be used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- new
---
 arch_init.c |  137 +++++++++++++++++++++++++++++++----------------------------
 arch_init.h |    3 ++
 2 files changed, 74 insertions(+), 66 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 7e6d84e..c77e24d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -721,7 +721,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
+static int load_xbzrle(QEMUFile *f, void *host)
 {
     int ret, rc = 0;
     unsigned int xh_len;
@@ -792,12 +792,73 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     return NULL;
 }
 
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+    /* Synchronize RAM block list */
+    char id[256];
+    ram_addr_t length;
+
+    while (total_ram_bytes) {
+        RAMBlock *block;
+        uint8_t len;
+
+        len = qemu_get_byte(f);
+        qemu_get_buffer(f, (uint8_t *)id, len);
+        id[len] = 0;
+        length = qemu_get_be64(f);
+
+        QLIST_FOREACH(block, &ram_list.blocks, next) {
+            if (!strncmp(id, block->idstr, sizeof(id))) {
+                if (block->length != length)
+                    return -EINVAL;
+                break;
+            }
+        }
+
+        if (!block) {
+            fprintf(stderr, "Unknown ramblock \"%s\", cannot "
+                    "accept migration\n", id);
+            return -EINVAL;
+        }
+
+        total_ram_bytes -= length;
+    }
+
+    return 0;
+}
+
+int ram_load_page(QEMUFile *f, void *host, int flags)
+{
+    if (flags & RAM_SAVE_FLAG_COMPRESS) {
+        uint8_t ch;
+        ch = qemu_get_byte(f);
+        memset(host, ch, TARGET_PAGE_SIZE);
+#ifndef _WIN32
+        if (ch == 0 &&
+            (!kvm_enabled() || kvm_has_sync_mmu())) {
+            qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
+        }
+#endif
+    } else if (flags & RAM_SAVE_FLAG_PAGE) {
+        qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+    } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
+        if (!migrate_use_xbzrle()) {
+            return -EINVAL;
+        }
+        if (load_xbzrle(f, host) < 0) {
+            return -EINVAL;
+        }
+    }
+    return 0;
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     ram_addr_t addr;
     int flags, ret = 0;
     int error;
     static uint64_t seq_iter;
+    void *host;
 
     seq_iter++;
 
@@ -813,82 +874,26 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 
         if (flags & RAM_SAVE_FLAG_MEM_SIZE) {
             if (version_id == 4) {
-                /* Synchronize RAM block list */
-                char id[256];
-                ram_addr_t length;
-                ram_addr_t total_ram_bytes = addr;
-
-                while (total_ram_bytes) {
-                    RAMBlock *block;
-                    uint8_t len;
-
-                    len = qemu_get_byte(f);
-                    qemu_get_buffer(f, (uint8_t *)id, len);
-                    id[len] = 0;
-                    length = qemu_get_be64(f);
-
-                    QLIST_FOREACH(block, &ram_list.blocks, next) {
-                        if (!strncmp(id, block->idstr, sizeof(id))) {
-                            if (block->length != length) {
-                                ret =  -EINVAL;
-                                goto done;
-                            }
-                            break;
-                        }
-                    }
-
-                    if (!block) {
-                        fprintf(stderr, "Unknown ramblock \"%s\", cannot "
-                                "accept migration\n", id);
-                        ret = -EINVAL;
-                        goto done;
-                    }
-
-                    total_ram_bytes -= length;
+                error = ram_load_mem_size(f, addr);
+                if (error) {
+                    DPRINTF("error %d\n", error);
+                    return error;
                 }
             }
         }
 
-        if (flags & RAM_SAVE_FLAG_COMPRESS) {
-            void *host;
-            uint8_t ch;
-
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                return -EINVAL;
-            }
-
-            ch = qemu_get_byte(f);
-            memset(host, ch, TARGET_PAGE_SIZE);
-#ifndef _WIN32
-            if (ch == 0 &&
-                (!kvm_enabled() || kvm_has_sync_mmu())) {
-                qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
-            }
-#endif
-        } else if (flags & RAM_SAVE_FLAG_PAGE) {
-            void *host;
-
+        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
+                     RAM_SAVE_FLAG_XBZRLE)) {
             host = host_from_stream_offset(f, addr, flags);
             if (!host) {
                 return -EINVAL;
             }
-
-            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
-        } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
-            if (!migrate_use_xbzrle()) {
-                return -EINVAL;
-            }
-            void *host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                return -EINVAL;
-            }
-
-            if (load_xbzrle(f, addr, host) < 0) {
-                ret = -EINVAL;
+            ret = ram_load_page(f, host, flags);
+            if (ret) {
                 goto done;
             }
         }
+
         error = qemu_file_get_error(f);
         if (error) {
             ret = error;
diff --git a/arch_init.h b/arch_init.h
index f2a7ae5..bca1a29 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -46,9 +46,12 @@ CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp);
 
 #define RAM_SAVE_VERSION_ID     4 /* currently version 4 */
 
+int ram_load_page(QEMUFile *f, void *host, int flags);
+
 #if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
 bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
                    bool last_stage);
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes);
 #endif
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 17/35] arch_init: factor out logic to find ram block with id string
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (15 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 16/35] arch_init/ram_load: refactor ram_load Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 18/35] migration: export migrate_fd_completed() and migrate_fd_cleanup() Isaku Yamahata
                   ` (20 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   31 ++++++++++++++++++++-----------
 arch_init.h |    1 +
 exec.c      |   12 ++++++------
 3 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index c77e24d..d82316d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -762,6 +762,19 @@ static int load_xbzrle(QEMUFile *f, void *host)
     return rc;
 }
 
+RAMBlock *ram_find_block(const char *id, uint8_t len)
+{
+    RAMBlock *block;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (!strncmp(id, block->idstr, len)) {
+            return block;
+        }
+    }
+
+    return NULL;
+}
+
 static inline void *host_from_stream_offset(QEMUFile *f,
                                             ram_addr_t offset,
                                             int flags)
@@ -783,9 +796,9 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     qemu_get_buffer(f, (uint8_t *)id, len);
     id[len] = 0;
 
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
-        if (!strncmp(id, block->idstr, sizeof(id)))
-            return memory_region_get_ram_ptr(block->mr) + offset;
+    block = ram_find_block(id, len);
+    if (block) {
+        return memory_region_get_ram_ptr(block->mr) + offset;
     }
 
     fprintf(stderr, "Can't find block %s!\n", id);
@@ -807,19 +820,15 @@ int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes)
         id[len] = 0;
         length = qemu_get_be64(f);
 
-        QLIST_FOREACH(block, &ram_list.blocks, next) {
-            if (!strncmp(id, block->idstr, sizeof(id))) {
-                if (block->length != length)
-                    return -EINVAL;
-                break;
-            }
-        }
-
+        block = ram_find_block(id, len);
         if (!block) {
             fprintf(stderr, "Unknown ramblock \"%s\", cannot "
                     "accept migration\n", id);
             return -EINVAL;
         }
+        if (block->length != length) {
+            return -EINVAL;
+        }
 
         total_ram_bytes -= length;
     }
diff --git a/arch_init.h b/arch_init.h
index bca1a29..499d0f1 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -51,6 +51,7 @@ int ram_load_page(QEMUFile *f, void *host, int flags);
 #if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
 bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
                    bool last_stage);
+RAMBlock *ram_find_block(const char *id, uint8_t len);
 int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes);
 #endif
 
diff --git a/exec.c b/exec.c
index 1414654..2aa4d90 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include "kvm.h"
 #include "hw/xen.h"
 #include "qemu-timer.h"
+#include "arch_init.h"
 #include "memory.h"
 #include "exec-memory.h"
 #if defined(CONFIG_USER_ONLY)
@@ -2517,12 +2518,11 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
     pstrcat(new_block->idstr, sizeof(new_block->idstr), name);
 
     qemu_mutex_lock_ramlist();
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
-        if (block != new_block && !strcmp(block->idstr, new_block->idstr)) {
-            fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
-                    new_block->idstr);
-            abort();
-        }
+    block = ram_find_block(new_block->idstr, strlen(new_block->idstr));
+    if (block != new_block) {
+        fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
+                new_block->idstr);
+        abort();
     }
     qemu_mutex_unlock_ramlist();
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 18/35] migration: export migrate_fd_completed() and migrate_fd_cleanup()
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (16 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 17/35] arch_init: factor out logic to find ram block with id string Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 19/35] uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh Isaku Yamahata
                   ` (19 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This will be used by postcopy migration.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration.c |    4 ++--
 migration.h |    2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration.c b/migration.c
index 8fcb466..00b0bc2 100644
--- a/migration.c
+++ b/migration.c
@@ -242,7 +242,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
-static int migrate_fd_cleanup(MigrationState *s)
+int migrate_fd_cleanup(MigrationState *s)
 {
     int ret = 0;
 
@@ -272,7 +272,7 @@ void migrate_fd_error(MigrationState *s)
     migrate_fd_cleanup(s);
 }
 
-static void migrate_fd_completed(MigrationState *s)
+void migrate_fd_completed(MigrationState *s)
 {
     DPRINTF("setting completed state\n");
     if (migrate_fd_cleanup(s) < 0) {
diff --git a/migration.h b/migration.h
index 73416ba..2d27738 100644
--- a/migration.h
+++ b/migration.h
@@ -74,7 +74,9 @@ int fd_start_incoming_migration(const char *path);
 
 int fd_start_outgoing_migration(MigrationState *s, const char *fdname);
 
+int migrate_fd_cleanup(MigrationState *s);
 void migrate_fd_error(MigrationState *s);
+void migrate_fd_completed(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 19/35] uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (17 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 18/35] migration: export migrate_fd_completed() and migrate_fd_cleanup() Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 20/35] osdep: add QEMU_MADV_REMOVE and tirivial fix Isaku Yamahata
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 linux-headers/linux/uvmem.h     |   41 +++++++++++++++++++++++++++++++++++++++
 scripts/update-linux-headers.sh |    2 +-
 2 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/linux/uvmem.h

diff --git a/linux-headers/linux/uvmem.h b/linux-headers/linux/uvmem.h
new file mode 100644
index 0000000..ea88980
--- /dev/null
+++ b/linux-headers/linux/uvmem.h
@@ -0,0 +1,41 @@
+/*
+ * User process backed memory.
+ * This is mainly for KVM post copy.
+ *
+ * Copyright (c) 2011,
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata <yamahata at valinux co jp>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __LINUX_UVMEM_H
+#define __LINUX_UVMEM_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+struct uvmem_init {
+	__u64 size;		/* in bytes */
+	__s32 shmem_fd;
+	__s32 padding;
+};
+
+#define UVMEMIO	0x1E
+
+/* ioctl for uvmem fd */
+#define UVMEM_INIT			_IOWR(UVMEMIO, 0x0, struct uvmem_init)
+
+#endif /* __LINUX_UVMEM_H */
diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 67be2ef..0fa25ce 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -57,7 +57,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h kvm_para.h vfio.h vhost.h virtio_config.h virtio_ring.h; do
+for header in kvm.h kvm_para.h vfio.h vhost.h virtio_config.h virtio_ring.h umem.h; do
     cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
 rm -rf "$output/linux-headers/asm-generic"
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 20/35] osdep: add QEMU_MADV_REMOVE and tirivial fix
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (18 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 19/35] uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 21/35] postcopy: introduce helper functions for postcopy Isaku Yamahata
                   ` (17 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

MADV_REMOVE will be used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 osdep.h |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/osdep.h b/osdep.h
index c5fd3d9..9e97f39 100644
--- a/osdep.h
+++ b/osdep.h
@@ -113,6 +113,11 @@ void qemu_vfree(void *ptr);
 #else
 #define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
 #endif
+#ifdef MADV_REMOVE
+#define QEMU_MADV_REMOVE MADV_REMOVE
+#else
+#define QEMU_MADV_REMOVE QEMU_MADV_INVALID
+#endif
 
 #elif defined(CONFIG_POSIX_MADVISE)
 
@@ -120,7 +125,9 @@ void qemu_vfree(void *ptr);
 #define QEMU_MADV_DONTNEED  POSIX_MADV_DONTNEED
 #define QEMU_MADV_DONTFORK  QEMU_MADV_INVALID
 #define QEMU_MADV_MERGEABLE QEMU_MADV_INVALID
-#define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
+#define QEMU_MADV_DONTDUMP  QEMU_MADV_INVALID
+#define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
+#define QEMU_MADV_REMOVE    QEMU_MADV_INVALID
 
 #else /* no-op */
 
@@ -128,7 +135,9 @@ void qemu_vfree(void *ptr);
 #define QEMU_MADV_DONTNEED  QEMU_MADV_INVALID
 #define QEMU_MADV_DONTFORK  QEMU_MADV_INVALID
 #define QEMU_MADV_MERGEABLE QEMU_MADV_INVALID
-#define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
+#define QEMU_MADV_DONTDUMP  QEMU_MADV_INVALID
+#define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
+#define QEMU_MADV_REMOVE    QEMU_MADV_INVALID
 
 #endif
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 21/35] postcopy: introduce helper functions for postcopy
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (19 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 20/35] osdep: add QEMU_MADV_REMOVE and tirivial fix Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 22/35] savevm: add new section that is used by postcopy Isaku Yamahata
                   ` (16 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This patch introduces helper function for postcopy to access
umem char device and to communicate between incoming-qemu and umemd.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
changes v2 -> v3:
- error check, don't abort
- typedef
- #ifdef CONFIG_LINUX
- code simplification

changes v1 -> v2:
- code simplification
- make fault trigger more robust
- introduce struct umem_pages
---
 umem.c |  291 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 umem.h |   88 ++++++++++++++++++++
 2 files changed, 379 insertions(+)
 create mode 100644 umem.c
 create mode 100644 umem.h

diff --git a/umem.c b/umem.c
new file mode 100644
index 0000000..b05377b
--- /dev/null
+++ b/umem.c
@@ -0,0 +1,291 @@
+/*
+ * umem.c: user process backed memory module for postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata <yamahata at valinux co jp>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+#include "config-host.h"
+#ifdef CONFIG_LINUX
+#include <linux/uvmem.h>
+#endif
+
+#include "bitops.h"
+#include "sysemu.h"
+#include "hw/hw.h"
+#include "umem.h"
+
+//#define DEBUG_UMEM
+#ifdef DEBUG_UMEM
+#define DPRINTF(format, ...)                                            \
+    do {                                                                \
+        printf("%s:%d "format, __func__, __LINE__, ## __VA_ARGS__);     \
+    } while (0)
+#else
+#define DPRINTF(format, ...)    do { } while (0)
+#endif
+
+#define DEV_UMEM        "/dev/uvmem"
+
+int umem_new(void *hostp, size_t size, UMem** umemp)
+{
+#ifdef CONFIG_LINUX
+    struct uvmem_init uinit = {
+        .size = size,
+        .shmem_fd = -1,
+    };
+    UMem *umem;
+    int error;
+
+    assert((size % getpagesize()) == 0);
+    umem = g_new(UMem, 1);
+    umem->fd = open(DEV_UMEM, O_RDWR);
+    if (umem->fd < 0) {
+        error = -errno;
+        perror("can't open "DEV_UMEM);
+        goto error;
+    }
+
+    if (ioctl(umem->fd, UVMEM_INIT, &uinit) < 0) {
+        error = -errno;
+        perror("UMEM_INIT failed");
+        goto error;
+    }
+    if (ftruncate(uinit.shmem_fd, uinit.size) < 0) {
+        error = -errno;
+        perror("truncate(\"shmem_fd\") failed");
+        goto error;
+    }
+
+    umem->nbits = 0;
+    umem->nsets = 0;
+    umem->faulted = NULL;
+    umem->page_shift = ffs(getpagesize()) - 1;
+    umem->shmem_fd = uinit.shmem_fd;
+    umem->size = uinit.size;
+    umem->umem = mmap(hostp, size, PROT_EXEC | PROT_READ | PROT_WRITE,
+                      MAP_PRIVATE | MAP_FIXED, umem->fd, 0);
+    if (umem->umem == MAP_FAILED) {
+        error = -errno;
+        perror("mmap(UMem) failed");
+        goto error;
+    }
+    *umemp = umem;
+    return 0;
+
+error:
+    if (umem->fd >= 0) {
+        close(umem->fd);
+    }
+    if (uinit.shmem_fd >= 0) {
+        close(uinit.shmem_fd);
+    }
+    g_free(umem);
+    return error;
+#else
+    perror("postcopy migration is not supported");
+    return -ENOSYS;
+#endif
+}
+
+void umem_destroy(UMem *umem)
+{
+    if (umem->fd != -1) {
+        close(umem->fd);
+    }
+    if (umem->shmem_fd != -1) {
+        close(umem->shmem_fd);
+    }
+    g_free(umem->faulted);
+    g_free(umem);
+}
+
+size_t umem_pages_size(uint64_t nr)
+{
+    return sizeof(UMemPages) + nr * sizeof(uint64_t);
+}
+
+int umem_get_page_request(UMem *umem, UMemPages *page_request)
+{
+    ssize_t ret = read(umem->fd, page_request->pgoffs,
+                       page_request->nr * sizeof(page_request->pgoffs[0]));
+    if (ret < 0) {
+        if (errno != EINTR) {
+            perror("daemon: umem read failed");
+            return -errno;
+        }
+        ret = 0;
+    }
+    page_request->nr = ret / sizeof(page_request->pgoffs[0]);
+    return 0;
+}
+
+int umem_mark_page_cached(UMem *umem, UMemPages *page_cached)
+{
+    const void *buf = page_cached->pgoffs;
+    size_t size = page_cached->nr * sizeof(page_cached->pgoffs[0]);
+    ssize_t ret;
+
+    ret = qemu_write_full(umem->fd, buf, size);
+    if (ret != size) {
+        perror("daemon: umem write");
+        return -errno;
+    }
+    return 0;
+}
+
+void umem_unmap(UMem *umem)
+{
+    munmap(umem->umem, umem->size);
+    umem->umem = NULL;
+}
+
+void umem_close(UMem *umem)
+{
+    close(umem->fd);
+    umem->fd = -1;
+}
+
+int umem_map_shmem(UMem *umem)
+{
+    umem->nbits = umem->size >> umem->page_shift;
+    umem->nsets = 0;
+    umem->faulted = g_new0(unsigned long, BITS_TO_LONGS(umem->nbits));
+
+    umem->shmem = mmap(NULL, umem->size, PROT_READ | PROT_WRITE, MAP_SHARED,
+                       umem->shmem_fd, 0);
+    if (umem->shmem == MAP_FAILED) {
+        perror("daemon: mmap(\"shmem\")");
+        return -errno;
+    }
+    return 0;
+}
+
+void umem_unmap_shmem(UMem *umem)
+{
+    if (umem->shmem) {
+        munmap(umem->shmem, umem->size);
+        umem->shmem = NULL;
+    }
+}
+
+void umem_remove_shmem(UMem *umem, size_t offset, size_t size)
+{
+    size_t s = offset >> umem->page_shift;
+    size_t e = (offset + size) >> umem->page_shift;
+    size_t i;
+
+    for (i = s; i < e; i++) {
+        if (!test_and_set_bit(i, umem->faulted)) {
+            umem->nsets++;
+            qemu_madvise(umem->shmem + offset, size, QEMU_MADV_REMOVE);
+        }
+    }
+}
+
+bool umem_shmem_finished(const UMem *umem)
+{
+    return umem->nsets == umem->nbits;
+}
+
+void umem_close_shmem(UMem *umem)
+{
+    close(umem->shmem_fd);
+    umem->shmem_fd = -1;
+}
+
+/***************************************************************************/
+/* qemu main loop <-> umem thread communication */
+
+static int umem_write_cmd(int fd, uint8_t cmd)
+{
+    ssize_t size;
+
+    DPRINTF("write cmd %c\n", cmd);
+    size = qemu_write_full(fd, &cmd, sizeof(cmd));
+    if (size == 0) {
+        if (errno == EPIPE) {
+            perror("pipe is closed");
+            DPRINTF("write cmd %c %d: pipe is closed\n", cmd, errno);
+            return 0;
+        }
+        perror("fail to write to pipe");
+        DPRINTF("write cmd %c %d\n", cmd, errno);
+        return -1;
+    }
+    return 0;
+}
+
+static int umem_read_cmd(int fd, uint8_t expect)
+{
+    ssize_t size;
+    uint8_t cmd;
+
+    size = qemu_read_full(fd, &cmd, sizeof(cmd));
+    if (size == 0) {
+        DPRINTF("read cmd %c: pipe is closed\n", cmd);
+        return -1;
+    }
+
+    DPRINTF("read cmd %c\n", cmd);
+    if (cmd != expect) {
+        DPRINTF("cmd %c expect %d\n", cmd, expect);
+        return -1;
+    }
+    return 0;
+}
+
+/* umem thread -> qemu main loop */
+int umem_daemon_ready(int to_qemu_fd)
+{
+    return umem_write_cmd(to_qemu_fd, UMEM_DAEMON_READY);
+}
+
+int umem_daemon_wait_for_qemu(int from_qemu_fd)
+{
+    return umem_read_cmd(from_qemu_fd, UMEM_QEMU_READY);
+}
+
+void umem_daemon_quit(QEMUFile *to_qemu)
+{
+    qemu_put_byte(to_qemu, UMEM_DAEMON_QUIT);
+}
+
+void umem_daemon_error(QEMUFile *to_qemu)
+{
+    qemu_put_byte(to_qemu, UMEM_DAEMON_ERROR);
+}
+
+/* qemu main loop -> umem thread */
+int umem_qemu_wait_for_daemon(int from_umemd_fd)
+{
+    return umem_read_cmd(from_umemd_fd, UMEM_DAEMON_READY);
+}
+
+int umem_qemu_ready(int to_umemd_fd)
+{
+    return umem_write_cmd(to_umemd_fd, UMEM_QEMU_READY);
+}
+
+void umem_qemu_quit(QEMUFile *to_umemd)
+{
+    qemu_put_byte(to_umemd, UMEM_QEMU_QUIT);
+}
diff --git a/umem.h b/umem.h
new file mode 100644
index 0000000..dbc965c
--- /dev/null
+++ b/umem.h
@@ -0,0 +1,88 @@
+/*
+ * umem.h: user process backed memory module for postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata <yamahata at valinux co jp>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_UMEM_H
+#define QEMU_UMEM_H
+
+#include "qemu-common.h"
+
+typedef struct UMemDev UMemDev;
+
+struct UMem {
+    void *umem;
+    int fd;
+    void *shmem;
+    int shmem_fd;
+    uint64_t size;
+
+    /* indexed by host page size */
+    int page_shift;
+    int nbits;
+    int nsets;
+    unsigned long *faulted;
+};
+typedef struct UMem UMem;
+
+struct UMemPages {
+    uint64_t nr;
+    uint64_t pgoffs[0];
+};
+typedef struct UMemPages UMemPages;
+
+int umem_new(void *hostp, size_t size, UMem** umemp);
+void umem_destroy(UMem *umem);
+
+/* umem device operations */
+size_t umem_pages_size(uint64_t nr);
+int umem_get_page_request(UMem *umem, UMemPages *page_request);
+int umem_mark_page_cached(UMem *umem, UMemPages *page_cached);
+void umem_unmap(UMem *umem);
+void umem_close(UMem *umem);
+
+/* umem shmem operations */
+int umem_map_shmem(UMem *umem);
+void umem_unmap_shmem(UMem *umem);
+void umem_remove_shmem(UMem *umem, size_t offset, size_t size);
+bool umem_shmem_finished(const UMem *umem);
+void umem_close_shmem(UMem *umem);
+
+/* umem thread -> qemu main loop */
+#define UMEM_DAEMON_READY               'R'
+#define UMEM_DAEMON_QUIT                'Q'
+#define UMEM_DAEMON_ERROR               'E'
+
+/* qemu main loop -> umem thread */
+#define UMEM_QEMU_READY                 'r'
+#define UMEM_QEMU_QUIT                  'q'
+
+/* for umem thread */
+int umem_daemon_ready(int to_qemu_fd);
+int umem_daemon_wait_for_qemu(int from_qemu_fd);
+void umem_daemon_quit(QEMUFile *to_qemu);
+void umem_daemon_error(QEMUFile *to_qemu);
+
+/* for qemu main loop */
+int umem_qemu_wait_for_daemon(int from_umemd_fd);
+int umem_qemu_ready(int to_umemd_fd);
+void umem_qemu_quit(QEMUFile *to_umemd);
+
+#endif /* QEMU_UMEM_H */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 22/35] savevm: add new section that is used by postcopy
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (20 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 21/35] postcopy: introduce helper functions for postcopy Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 23/35] postcopy: implement incoming part of postcopy live migration Isaku Yamahata
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This is used by postcopy to tell the total length of QEMU_VM_SECTION_FULL
and QEMU_VM_SUBSECTION from outgoing to incoming.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 savevm.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/savevm.c b/savevm.c
index 93c51ab..c93b6eb 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1614,6 +1614,10 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
 #define QEMU_VM_SECTION_FULL         0x04
 #define QEMU_VM_SUBSECTION           0x05
 
+/* This section is used by postcopy to tell postcopy enabled session.
+   If the destination side doesn't know, it sees unknown section and abort. */
+#define QEMU_VM_POSTCOPY             0x10
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 23/35] postcopy: implement incoming part of postcopy live migration
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (21 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 22/35] savevm: add new section that is used by postcopy Isaku Yamahata
@ 2012-10-30  8:32 ` Isaku Yamahata
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 24/35] postcopy outgoing: add -p option to migrate command Isaku Yamahata
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:32 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This patch implements postcopy live migration for incoming part

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- threading, not fork
- use blocking io instead of select + non-blocking io
- don't modify RAMBlock
- When device allocates its own RAM region, e.g. vshmem, it's handled by
  device save/load. So skip it such area which has RAM_PREALLOLC_MASK flags
  set.
- less memory overhead
- drop -postcopy option. It is automatically detected.
- various improvement and simplification
- error handling

Changes v1 -> v2:
- fork umemd early to address qemu devices touching guest ram via
  post/pre_load
- code clean up on initialization
- Makefile.target
  migration-postcopy.c is target dependent due to TARGET_PAGE_xxx
  So it can't be shared between target architecture.
- use qemu_fopen_fd
- introduce incoming_flags_use_umem_make_present flag
- use MADV_DONTNEED
- make incoming socket nonblocking
- several clean ups
- Dropped QEMUFilePipe
- Moved QEMUFileNonblock to buffered_file
- Split out into umem/incoming/outgoing
- make mig_read nonblocking when socket
- updates for umem device changes
---
 Makefile.target      |    2 +
 cpu-all.h            |    3 +
 exec.c               |    6 +
 migration-fd.c       |    4 +-
 migration-postcopy.c | 1249 ++++++++++++++++++++++++++++++++++++++++++++++++++
 migration-tcp.c      |   10 +-
 migration-unix.c     |   10 +-
 migration.h          |   10 +
 savevm.c             |   28 ++
 vl.c                 |    2 +
 10 files changed, 1315 insertions(+), 9 deletions(-)
 create mode 100644 migration-postcopy.c

diff --git a/Makefile.target b/Makefile.target
index 3822bc5..930c070 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -121,6 +121,8 @@ obj-$(CONFIG_NO_GET_MEMORY_MAPPING) += memory_mapping-stub.o
 obj-$(CONFIG_NO_CORE_DUMP) += dump-stub.o
 LIBS+=-lz
 
+obj-y += migration-postcopy.o umem.o
+
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
 QEMU_CFLAGS += $(VNC_JPEG_CFLAGS)
diff --git a/cpu-all.h b/cpu-all.h
index b5fefc8..79846fe 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -485,6 +485,9 @@ extern ram_addr_t ram_size;
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
 #define RAM_PREALLOC_MASK   (1 << 0)
 
+/* RAM is allocated via umem for postcopy incoming mode */
+#define RAM_POSTCOPY_UMEM_MASK  (1 << 1)
+
 typedef struct RAMBlock {
     struct MemoryRegion *mr;
     uint8_t *host;
diff --git a/exec.c b/exec.c
index 2aa4d90..6da991a 100644
--- a/exec.c
+++ b/exec.c
@@ -36,6 +36,7 @@
 #include "arch_init.h"
 #include "memory.h"
 #include "exec-memory.h"
+#include "migration.h"
 #if defined(CONFIG_USER_ONLY)
 #include <qemu.h>
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -2555,6 +2556,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
         new_block->host = host;
         new_block->flags |= RAM_PREALLOC_MASK;
     } else {
+        ram_addr_t page_size = getpagesize();
+        size = (size + page_size - 1) & ~(page_size - 1);
         if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
             new_block->host = file_ram_alloc(new_block, size, mem_path);
@@ -2635,6 +2638,9 @@ void qemu_ram_free(ram_addr_t addr)
             ram_list.version++;
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
+            }
+            else if (block->flags & RAM_POSTCOPY_UMEM_MASK) {
+                postcopy_incoming_ram_free(block);
             } else if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
                 if (block->fd) {
diff --git a/migration-fd.c b/migration-fd.c
index b3c54e5..8384975 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -105,7 +105,9 @@ static void fd_accept_incoming_migration(void *opaque)
 
     process_incoming_migration(f);
     qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
-    qemu_fclose(f);
+    if (!incoming_postcopy) {
+        qemu_fclose(f);
+    }
 }
 
 int fd_start_incoming_migration(const char *infd)
diff --git a/migration-postcopy.c b/migration-postcopy.c
new file mode 100644
index 0000000..0809ffa
--- /dev/null
+++ b/migration-postcopy.c
@@ -0,0 +1,1249 @@
+/*
+ * migration-postcopy.c: postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata <yamahata at valinux co jp>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "config-host.h"
+
+#if defined(CONFIG_MADVISE) || defined(CONFIG_POSIX_MADVISE)
+#include <sys/mman.h>
+#endif
+
+#include "bitmap.h"
+#include "sysemu.h"
+#include "kvm.h"
+#include "hw/hw.h"
+#include "arch_init.h"
+#include "migration.h"
+#include "buffered_file.h"
+#include "qemu_socket.h"
+#include "qemu-thread.h"
+#include "umem.h"
+
+#include "memory.h"
+#include "cpu-common.h"
+
+//#define DEBUG_POSTCOPY
+#ifdef DEBUG_POSTCOPY
+#define DPRINTF(fmt, ...)                                               \
+    do {                                                                \
+        printf("%s:%d: " fmt, __func__, __LINE__, ## __VA_ARGS__);      \
+    } while (0)
+#else
+#define DPRINTF(fmt, ...)       do { } while (0)
+#endif
+
+static void fd_close(int *fd)
+{
+    if (*fd >= 0) {
+        close(*fd);
+        *fd = -1;
+    }
+}
+
+static void set_fd(int fd, fd_set *fds, int *nfds)
+{
+    FD_SET(fd, fds);
+    if (fd > *nfds) {
+        *nfds = fd;
+    }
+}
+
+/***************************************************************************
+ * umem daemon on destination <-> qemu on source protocol
+ */
+
+#define QEMU_UMEM_REQ_INIT      0x00
+#define QEMU_UMEM_REQ_EOC       0x01
+#define QEMU_UMEM_REQ_PAGE      0x02
+#define QEMU_UMEM_REQ_PAGE_CONT 0x03
+
+struct qemu_umem_req {
+    int8_t cmd;
+    uint8_t len;
+    char *idstr;        /* REQ_PAGE */
+    uint32_t nr;        /* REQ_PAGE, REQ_PAGE_CONT */
+
+    /* in target page size as qemu migration protocol */
+    uint64_t *pgoffs;   /* REQ_PAGE, REQ_PAGE_CONT */
+};
+
+static void postcopy_incoming_send_req_idstr(QEMUFile *f, const char* idstr)
+{
+    qemu_put_byte(f, strlen(idstr));
+    qemu_put_buffer(f, (uint8_t *)idstr, strlen(idstr));
+}
+
+static void postcopy_incoming_send_req_pgoffs(QEMUFile *f, uint32_t nr,
+                                              const uint64_t *pgoffs)
+{
+    uint32_t i;
+
+    qemu_put_be32(f, nr);
+    for (i = 0; i < nr; i++) {
+        qemu_put_be64(f, pgoffs[i]);
+    }
+}
+
+static void postcopy_incoming_send_req_one(QEMUFile *f,
+                                           const struct qemu_umem_req *req)
+{
+    DPRINTF("cmd %d\n", req->cmd);
+    qemu_put_byte(f, req->cmd);
+    switch (req->cmd) {
+    case QEMU_UMEM_REQ_INIT:
+    case QEMU_UMEM_REQ_EOC:
+        /* nothing */
+        break;
+    case QEMU_UMEM_REQ_PAGE:
+        postcopy_incoming_send_req_idstr(f, req->idstr);
+        postcopy_incoming_send_req_pgoffs(f, req->nr, req->pgoffs);
+        break;
+    case QEMU_UMEM_REQ_PAGE_CONT:
+        postcopy_incoming_send_req_pgoffs(f, req->nr, req->pgoffs);
+        break;
+    default:
+        abort();
+        break;
+    }
+}
+
+/* QEMUFile can buffer up to IO_BUF_SIZE = 32 * 1024.
+ * So one message size must be <= IO_BUF_SIZE
+ * cmd: 1
+ * id len: 1
+ * id: 256
+ * nr: 2
+ */
+#define MAX_PAGE_NR     ((32 * 1024 - 1 - 1 - 256 - 2) / sizeof(uint64_t))
+static void postcopy_incoming_send_req(QEMUFile *f,
+                                       const struct qemu_umem_req *req)
+{
+    uint32_t nr = req->nr;
+    struct qemu_umem_req tmp = *req;
+
+    switch (req->cmd) {
+    case QEMU_UMEM_REQ_INIT:
+    case QEMU_UMEM_REQ_EOC:
+        postcopy_incoming_send_req_one(f, &tmp);
+        break;
+    case QEMU_UMEM_REQ_PAGE:
+        tmp.nr = MIN(nr, MAX_PAGE_NR);
+        postcopy_incoming_send_req_one(f, &tmp);
+
+        nr -= tmp.nr;
+        tmp.pgoffs += tmp.nr;
+        tmp.cmd = QEMU_UMEM_REQ_PAGE_CONT;
+        /* fall through */
+    case QEMU_UMEM_REQ_PAGE_CONT:
+        while (nr > 0) {
+            tmp.nr = MIN(nr, MAX_PAGE_NR);
+            postcopy_incoming_send_req_one(f, &tmp);
+
+            nr -= tmp.nr;
+            tmp.pgoffs += tmp.nr;
+        }
+        break;
+    default:
+        abort();
+        break;
+    }
+}
+
+/***************************************************************************
+ * QEMU_VM_POSTCOPY section subtype
+ */
+#define QEMU_VM_POSTCOPY_INIT           0
+#define QEMU_VM_POSTCOPY_SECTION_FULL   1
+
+/***************************************************************************
+ * incoming part
+ */
+
+bool incoming_postcopy = false;
+
+
+#define PIS_STATE_QUIT_RECEIVED         0x01
+#define PIS_STATE_QUIT_QUEUED           0x02
+#define PIS_STATE_QUIT_SENT             0x04
+
+#define PIS_STATE_QUIT_MASK             (PIS_STATE_QUIT_RECEIVED | \
+                                         PIS_STATE_QUIT_QUEUED | \
+                                         PIS_STATE_QUIT_SENT)
+
+struct PostcopyIncomingState {
+    /* dest qemu state */
+    uint32_t    state;
+
+    int host_page_size;
+    int host_page_shift;
+
+    /* qemu side */
+    int to_umemd_fd;
+    QEMUFile *to_umemd;
+
+    int from_umemd_fd;
+    QEMUFile *from_umemd;
+    int version_id;     /* save/load format version id */
+};
+typedef struct PostcopyIncomingState PostcopyIncomingState;
+
+
+#define UMEM_STATE_EOS_RECEIVED         0x01    /* umem daemon <-> src qemu */
+#define UMEM_STATE_EOC_SEND_REQ         0x02    /* umem daemon <-> src qemu */
+#define UMEM_STATE_EOC_SENDING          0x04    /* umem daemon <-> src qemu */
+#define UMEM_STATE_EOC_SENT             0x08    /* umem daemon <-> src qemu */
+
+#define UMEM_STATE_QUIT_RECEIVED        0x10    /* umem daemon <-> dst qemu */
+#define UMEM_STATE_QUIT_HANDLED         0x20    /* umem daemon <-> dst qemu */
+#define UMEM_STATE_QUIT_QUEUED          0x40    /* umem daemon <-> dst qemu */
+#define UMEM_STATE_QUIT_SENDING         0x80    /* umem daemon <-> dst qemu */
+#define UMEM_STATE_QUIT_SENT            0x100   /* umem daemon <-> dst qemu */
+
+#define UMEM_STATE_ERROR_REQ            0x1000  /* umem daemon error */
+#define UMEM_STATE_ERROR_SENDING        0x2000  /* umem daemon error */
+#define UMEM_STATE_ERROR_SENT           0x3000  /* umem daemon error */
+
+#define UMEM_STATE_QUIT_MASK            (UMEM_STATE_QUIT_QUEUED |   \
+                                         UMEM_STATE_QUIT_SENDING |  \
+                                         UMEM_STATE_QUIT_SENT |     \
+                                         UMEM_STATE_QUIT_RECEIVED | \
+                                         UMEM_STATE_QUIT_HANDLED)
+#define UMEM_STATE_END_MASK             (UMEM_STATE_EOS_RECEIVED | \
+                                         UMEM_STATE_EOC_SEND_REQ | \
+                                         UMEM_STATE_EOC_SENDING |  \
+                                         UMEM_STATE_EOC_SENT |     \
+                                         UMEM_STATE_QUIT_MASK)
+
+struct UMemBlock {
+    UMem* umem;
+    char idstr[256];
+    ram_addr_t offset;
+    ram_addr_t length;
+    QLIST_ENTRY(UMemBlock) next;
+};
+typedef struct UMemBlock UMemBlock;
+
+struct PostcopyIncomingUMemDaemon {
+    /* umem daemon side */
+    QemuMutex mutex;
+    uint32_t state;     /* shared state. protected by mutex */
+
+    /* read only */
+    int host_page_size;
+    int host_page_shift;
+    int nr_host_pages_per_target_page;
+    int host_to_target_page_shift;
+    int nr_target_pages_per_host_page;
+    int target_to_host_page_shift;
+    int version_id;     /* save/load format version id */
+
+    QemuThread thread;
+    QLIST_HEAD(, UMemBlock) blocks;
+
+    /* thread to communicate with qemu main loop via pipe */
+    QemuThread pipe_thread;
+    int to_qemu_fd;
+    QEMUFile *to_qemu;
+    int from_qemu_fd;
+    QEMUFile *from_qemu;
+
+    /* = KVM_MAX_VCPUS * (ASYNC_PF_PER_VCPUS + 1) */
+#define MAX_REQUESTS    (512 * (64 + 1))
+
+    /* thread to read from outgoing qemu */
+    QemuThread mig_read_thread;
+    int mig_read_fd;
+    QEMUFile *mig_read;                 /* qemu on source -> umem daemon */
+    UMemBlock *last_block_read;         /* qemu on source -> umem daemon */
+    /* bitmap indexed by target page offset */
+    unsigned long *phys_received;
+    UMemPages *page_cached;
+
+    /* thread to write to outgoing qemu */
+    QemuThread mig_write_thread;
+    int mig_write_fd;
+    QEMUFile *mig_write;                /* umem daemon -> qemu on source */
+    UMemBlock *last_block_write;        /* umem daemon -> qemu on source */
+    /* bitmap indexed by target page offset */
+    unsigned long *phys_requested;
+    UMemPages *page_request;
+    uint64_t *target_pgoffs;
+};
+typedef struct PostcopyIncomingUMemDaemon PostcopyIncomingUMemDaemon;
+
+static PostcopyIncomingState state = {
+    .state = 0,
+    .to_umemd_fd = -1,
+    .to_umemd = NULL,
+    .from_umemd_fd = -1,
+    .from_umemd = NULL,
+};
+
+static PostcopyIncomingUMemDaemon umemd = {
+    .state = 0,
+    .to_qemu_fd = -1,
+    .to_qemu = NULL,
+    .from_qemu_fd = -1,
+    .from_qemu = NULL,
+    .blocks = QLIST_HEAD_INITIALIZER(&umemd.blocks),
+    .mig_read_fd = -1,
+    .mig_read = NULL,
+    .mig_write_fd = -1,
+    .mig_write = NULL,
+};
+
+static void *postcopy_incoming_umemd(void*);
+static void postcopy_incoming_qemu_handle_req(void *opaque);
+
+/* protected by qemu_mutex_lock_ramlist() */
+void postcopy_incoming_ram_free(RAMBlock *ram_block)
+{
+    UMemBlock *block;
+    QLIST_FOREACH(block, &umemd.blocks, next) {
+        if (!strncmp(ram_block->idstr, block->idstr, strlen(block->idstr))) {
+            break;
+        }
+    }
+    if (block != NULL) {
+        umem_unmap(block->umem);
+    } else {
+        munmap(ram_block->host, ram_block->length);
+    }
+}
+
+static int postcopy_incoming_ram_load_get64(QEMUFile *f,
+                                            ram_addr_t *addr, int *flags)
+{
+    *addr = qemu_get_be64(f);
+    *flags = *addr & ~TARGET_PAGE_MASK;
+    *addr &= TARGET_PAGE_MASK;
+    return qemu_file_get_error(f);
+}
+
+int postcopy_incoming_ram_load(QEMUFile *f, void *opaque, int version_id)
+{
+    ram_addr_t addr;
+    int flags;
+    int error;
+
+    DPRINTF("incoming ram load\n");
+    /*
+     * RAM_SAVE_FLAGS_EOS or
+     * RAM_SAVE_FLAGS_MEM_SIZE + mem size + RAM_SAVE_FLAGS_EOS
+     * see postcopy_outgoing_ram_save_live()
+     */
+
+    if (version_id != RAM_SAVE_VERSION_ID) {
+        DPRINTF("RAM_SAVE_VERSION_ID %d != %d\n",
+                version_id, RAM_SAVE_VERSION_ID);
+        return -EINVAL;
+    }
+    error = postcopy_incoming_ram_load_get64(f, &addr, &flags);
+    DPRINTF("addr 0x%lx flags 0x%x\n", addr, flags);
+    if (error) {
+        DPRINTF("error %d\n", error);
+        return error;
+    }
+    if (flags == RAM_SAVE_FLAG_EOS && addr == 0) {
+        DPRINTF("EOS\n");
+        return 0;
+    }
+
+    if (flags != RAM_SAVE_FLAG_MEM_SIZE) {
+        DPRINTF("-EINVAL flags 0x%x\n", flags);
+        return -EINVAL;
+    }
+    error = ram_load_mem_size(f, addr);
+    if (error) {
+        DPRINTF("addr 0x%lx error %d\n", addr, error);
+        return error;
+    }
+
+    error = postcopy_incoming_ram_load_get64(f, &addr, &flags);
+    if (error) {
+        DPRINTF("addr 0x%lx flags 0x%x error %d\n", addr, flags, error);
+        return error;
+    }
+    if (flags == RAM_SAVE_FLAG_EOS && addr == 0) {
+        DPRINTF("done\n");
+        return 0;
+    }
+    DPRINTF("-EINVAL\n");
+    return -EINVAL;
+}
+
+static void postcopy_incoming_umem_block_free(void)
+{
+    UMemBlock *block;
+    UMemBlock *tmp;
+
+    /* to protect againt postcopy_incoming_ram_free() */
+    qemu_mutex_lock_ramlist();
+    QLIST_FOREACH_SAFE(block, &umemd.blocks, next, tmp) {
+        UMem *umem = block->umem;
+        umem_unmap_shmem(umem);
+        umem_destroy(umem);
+        QLIST_REMOVE(block, next);
+        g_free(block);
+    }
+    qemu_mutex_unlock_ramlist();
+}
+
+static int postcopy_incoming_prepare(void)
+{
+    int error = 0;
+    RAMBlock *block;
+    int nbits;
+
+    state.state = 0;
+    state.host_page_size = getpagesize();
+    state.host_page_shift = ffs(state.host_page_size) - 1;
+    state.version_id = RAM_SAVE_VERSION_ID; /* = save version of
+                                               ram_save_live() */
+
+    qemu_mutex_init(&umemd.mutex);
+    umemd.host_page_size = state.host_page_size;
+    umemd.host_page_shift = state.host_page_shift;
+
+    umemd.nr_host_pages_per_target_page =
+        TARGET_PAGE_SIZE / umemd.host_page_size;
+    umemd.nr_target_pages_per_host_page =
+        umemd.host_page_size / TARGET_PAGE_SIZE;
+    umemd.target_to_host_page_shift =
+        ffs(umemd.nr_host_pages_per_target_page) - 1;
+    umemd.host_to_target_page_shift =
+        ffs(umemd.nr_target_pages_per_host_page) - 1;
+
+    QLIST_INIT(&umemd.blocks);
+    qemu_mutex_lock_ramlist();
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        UMem *umem;
+        UMemBlock *umem_block;
+
+        if (block->flags & RAM_PREALLOC_MASK) {
+            continue;
+        }
+        error = umem_new(block->host, block->length, &umem);
+        if (error < 0) {
+            qemu_mutex_unlock_ramlist();
+            goto out;
+        }
+        umem_block = g_malloc0(sizeof(*umem_block));
+        umem_block->umem = umem;
+        umem_block->offset = block->offset;
+        umem_block->length = block->length;
+        pstrcpy(umem_block->idstr, sizeof(umem_block->idstr), block->idstr);
+
+        error = umem_map_shmem(umem_block->umem);
+        if (error) {
+            qemu_mutex_unlock_ramlist();
+            goto out;
+        }
+        umem_close_shmem(umem_block->umem);
+
+        block->flags |= RAM_POSTCOPY_UMEM_MASK;
+        QLIST_INSERT_HEAD(&umemd.blocks, umem_block, next);
+    }
+    qemu_mutex_unlock_ramlist();
+
+    umemd.page_request = g_malloc(umem_pages_size(MAX_REQUESTS));
+    umemd.page_cached = g_malloc(
+        umem_pages_size(MAX_REQUESTS *
+                        (TARGET_PAGE_SIZE >= umemd.host_page_size ?
+                         1: umemd.nr_host_pages_per_target_page)));
+    umemd.target_pgoffs =
+        g_new(uint64_t, MAX_REQUESTS *
+              MAX(umemd.nr_host_pages_per_target_page,
+                  umemd.nr_target_pages_per_host_page));
+
+    nbits = last_ram_offset() >> TARGET_PAGE_BITS;
+    umemd.phys_requested = bitmap_new(nbits);
+    umemd.phys_received = bitmap_new(nbits);
+    umemd.last_block_read = NULL;
+    umemd.last_block_write = NULL;
+    return 0;
+
+out:
+    postcopy_incoming_umem_block_free();
+    return error;
+}
+
+static int postcopy_incoming_loadvm_init(QEMUFile *f, uint32_t size)
+{
+    uint64_t options;
+    int flags;
+    int error;
+
+    if (size != sizeof(options)) {
+        fprintf(stderr, "unknown size %d\n", size);
+        return -EINVAL;
+    }
+    options = qemu_get_be64(f);
+    if (options) {
+        fprintf(stderr, "unknown options 0x%"PRIx64, options);
+        return -ENOSYS;
+    }
+    flags = fcntl(qemu_file_fd(f), F_GETFL);
+    if ((flags & O_ACCMODE) != O_RDWR) {
+        /* postcopy requires read/write file descriptor */
+        fprintf(stderr, "non-writable connection. "
+                "postcopy requires read/write connection \n");
+        return -EINVAL;
+    }
+    if (mem_path) {
+        fprintf(stderr, "mem_path is specified to %s. "
+                "postcopy doesn't work with it\n", mem_path);
+        return -ENOSYS;
+    }
+
+    DPRINTF("detected POSTCOPY\n");
+    error = postcopy_incoming_prepare();
+    if (error) {
+        return error;
+    }
+    savevm_ram_handlers.load_state = postcopy_incoming_ram_load;
+    incoming_postcopy = true;
+    return 0;
+}
+
+static int postcopy_incoming_create_umemd_thread(QEMUFile *mig_read)
+{
+    int error;
+    int fds[2];
+    int mig_read_fd;
+    int mig_write_fd;
+    assert((fcntl(qemu_file_fd(mig_read), F_GETFL) & O_ACCMODE) == O_RDWR);
+
+    if (qemu_pipe(fds) == -1) {
+        perror("qemu_pipe");
+        abort();
+    }
+    state.from_umemd_fd = fds[0];
+    umemd.to_qemu_fd = fds[1];
+
+    if (qemu_pipe(fds) == -1) {
+        perror("qemu_pipe");
+        abort();
+    }
+    umemd.from_qemu_fd = fds[0];
+    state.to_umemd_fd = fds[1];
+
+    mig_read_fd = qemu_file_fd(mig_read);
+    umemd.state = 0;
+    umemd.version_id = state.version_id;
+    umemd.mig_read_fd = mig_read_fd;
+    umemd.mig_read = mig_read;
+
+    mig_write_fd = dup(mig_read_fd);
+    if (mig_write_fd < 0) {
+        perror("could not dup for writable socket \n");
+        abort();
+    }
+    umemd.mig_write_fd = mig_write_fd;
+    umemd.mig_write = qemu_fopen_fd(mig_write_fd, "w");
+
+    qemu_thread_create(&umemd.thread, &postcopy_incoming_umemd,
+                       NULL, QEMU_THREAD_DETACHED);
+
+    error = umem_qemu_wait_for_daemon(state.from_umemd_fd);
+    if (error) {
+        return error;
+    }
+    /* now socket is disowned. So tell umem thread that it's safe to use it */
+    error = umem_qemu_ready(state.to_umemd_fd);
+    if (error) {
+        return error;
+    }
+
+    state.from_umemd = qemu_fopen_fd(state.from_umemd_fd, "r");
+    state.to_umemd = qemu_fopen_fd(state.to_umemd_fd, "w");
+    qemu_set_fd_handler(state.from_umemd_fd,
+                        postcopy_incoming_qemu_handle_req, NULL, NULL);
+    return 0;
+}
+
+static int postcopy_incoming_loadvm_section_full(QEMUFile *f, uint32_t size,
+                                                 QEMUFile **buf_file)
+{
+    int error;
+    uint8_t *buf;
+    int read_size;
+
+    /* as size comes from network, check if it's not unreasonably big
+     * At the moment, it is guessed as 16MB.
+     */
+    DPRINTF("size 0x%"PRIx32"\n", size);
+#define SAVE_VM_FULL_SIZE_MAX   (16 * 1024 * 1024)
+    if (size > SAVE_VM_FULL_SIZE_MAX) {
+        fprintf(stderr,
+                "QEMU_VM_POSTCOPY QEMU_VM_POSTCOPY_SECTION_FULL section seems "
+                "to have unreasonably big size 0x%x"PRIx32". aborting.\n"
+                "If its size is really correct, "
+                "please increase it in the code\n",
+                size);
+        return -EINVAL;
+    }
+
+    buf = g_malloc(size);
+    read_size = qemu_get_buffer(f, buf, size);
+    if (size != read_size) {
+        fprintf(stderr, "qemu: warning: error while postcopy size %d %d\n",
+                size, read_size);
+        g_free(buf);
+        return -EINVAL;
+    }
+    error = postcopy_incoming_create_umemd_thread(f);
+    if (error) {
+        return error;
+    }
+
+    /* VMStateDescription:pre/post_load and
+     * cpu_sychronize_all_post_init() may fault on guest RAM.
+     * (MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME)
+     * postcopy daemon needs to be forked before the fault.
+     */
+    *buf_file = qemu_fopen_buf_read(buf, size);
+    return 0;
+}
+
+int postcopy_incoming_loadvm_state(QEMUFile *f, QEMUFile **buf_file)
+{
+    int ret = 0;
+    uint8_t subtype;
+    uint32_t size;
+
+    subtype = qemu_get_ubyte(f);
+    size = qemu_get_be32(f);
+    switch (subtype) {
+    case QEMU_VM_POSTCOPY_INIT:
+        ret = postcopy_incoming_loadvm_init(f, size);
+        break;
+    case QEMU_VM_POSTCOPY_SECTION_FULL:
+        ret = postcopy_incoming_loadvm_section_full(f, size, buf_file);
+        break;
+    default:
+        ret = -EINVAL;
+        break;
+    }
+    return ret;
+}
+
+static void postcopy_incoming_qemu_recv_quit(void)
+{
+    if (state.state & PIS_STATE_QUIT_RECEIVED) {
+        return;
+    }
+
+    DPRINTF("|= PIS_STATE_QUIT_RECEIVED\n");
+    state.state |= PIS_STATE_QUIT_RECEIVED;
+    qemu_set_fd_handler(state.from_umemd_fd, NULL, NULL, NULL);
+    qemu_fclose(state.from_umemd);
+    state.from_umemd = NULL;
+    fd_close(&state.from_umemd_fd);
+}
+
+static void postcopy_incoming_qemu_check_quite_queued(void)
+{
+    if (state.state & PIS_STATE_QUIT_QUEUED &&
+        !(state.state & PIS_STATE_QUIT_SENT)) {
+        DPRINTF("|= PIS_STATE_QUIT_SENT\n");
+        state.state |= PIS_STATE_QUIT_SENT;
+
+        qemu_fclose(state.to_umemd);
+        state.to_umemd = NULL;
+        fd_close(&state.to_umemd_fd);
+    }
+}
+
+static void postcopy_incoming_qemu_queue_quit(void)
+{
+    if (state.state & PIS_STATE_QUIT_QUEUED) {
+        return;
+    }
+
+    DPRINTF("|= PIS_STATE_QUIT_QUEUED\n");
+    umem_qemu_quit(state.to_umemd);
+    state.state |= PIS_STATE_QUIT_QUEUED;
+}
+
+static void postcopy_incoming_qemu_handle_req(void *opaque)
+{
+    uint8_t cmd;
+
+    cmd = qemu_get_ubyte(state.from_umemd);
+    DPRINTF("cmd %c\n", cmd);
+
+    switch (cmd) {
+    case UMEM_DAEMON_QUIT:
+        postcopy_incoming_qemu_recv_quit();
+        postcopy_incoming_qemu_queue_quit();
+        postcopy_incoming_qemu_cleanup();
+        break;
+    case UMEM_DAEMON_ERROR:
+        /* umem daemon hit troubles, so it warned us to stop vm execution */
+        vm_stop(RUN_STATE_IO_ERROR); /* or RUN_STATE_INTERNAL_ERROR */
+        break;
+    default:
+        DPRINTF("unknown command %d\n", cmd);
+        abort();
+        break;
+    }
+
+    postcopy_incoming_qemu_check_quite_queued();
+}
+
+void postcopy_incoming_qemu_cleanup(void)
+{
+    /* when qemu will quit before completing postcopy, tell umem daemon
+       to tear down umem device and exit. */
+    if (state.to_umemd_fd >= 0) {
+        postcopy_incoming_qemu_queue_quit();
+        postcopy_incoming_qemu_check_quite_queued();
+    }
+}
+
+/**************************************************************************
+ * incoming umem daemon
+ */
+
+static void postcopy_incoming_umem_error_req(void)
+{
+    qemu_mutex_lock(&umemd.mutex);
+    umemd.state |= UMEM_STATE_ERROR_REQ;
+    qemu_mutex_unlock(&umemd.mutex);
+}
+
+static void postcopy_incoming_umem_recv_quit(void)
+{
+    qemu_mutex_lock(&umemd.mutex);
+    if (umemd.state & UMEM_STATE_QUIT_RECEIVED) {
+        qemu_mutex_unlock(&umemd.mutex);
+        return;
+    }
+    DPRINTF("|= UMEM_STATE_QUIT_RECEIVED\n");
+    umemd.state |= UMEM_STATE_QUIT_RECEIVED;
+    qemu_mutex_unlock(&umemd.mutex);
+
+    qemu_fclose(umemd.from_qemu);
+    umemd.from_qemu = NULL;
+    fd_close(&umemd.from_qemu_fd);
+
+    qemu_mutex_lock(&umemd.mutex);
+    DPRINTF("|= UMEM_STATE_QUIT_HANDLED\n");
+    umemd.state |= UMEM_STATE_QUIT_HANDLED;
+    qemu_mutex_unlock(&umemd.mutex);
+}
+
+/* call with umemd.mutex held */
+static void postcopy_incoming_umem_queue_quit_locked(void)
+{
+    if (umemd.state & UMEM_STATE_QUIT_QUEUED) {
+        return;
+    }
+    DPRINTF("|= UMEM_STATE_QUIT_QUEUED\n");
+    umemd.state |= UMEM_STATE_QUIT_QUEUED;
+}
+
+static void postcopy_incoming_umem_check_eoc_req(void)
+{
+    struct qemu_umem_req req;
+
+    qemu_mutex_lock(&umemd.mutex);
+    if (!(umemd.state & UMEM_STATE_EOC_SEND_REQ) ||
+        umemd.state & (UMEM_STATE_EOC_SENDING | UMEM_STATE_EOC_SENT)) {
+        qemu_mutex_unlock(&umemd.mutex);
+        return;
+    }
+
+    DPRINTF("|= UMEM_STATE_EOC_SENDING\n");
+    umemd.state |= UMEM_STATE_EOC_SENDING;
+    qemu_mutex_unlock(&umemd.mutex);
+
+    req.cmd = QEMU_UMEM_REQ_EOC;
+    postcopy_incoming_send_req(umemd.mig_write, &req);
+    qemu_fclose(umemd.mig_write);
+    umemd.mig_write = NULL;
+    fd_close(&umemd.mig_write_fd);
+
+    qemu_mutex_lock(&umemd.mutex);
+    DPRINTF("|= UMEM_STATE_EOC_SENT\n");
+    umemd.state |= UMEM_STATE_EOC_SENT;
+    qemu_mutex_unlock(&umemd.mutex);
+}
+
+static void postcopy_incoming_umem_req_eoc(void)
+{
+    qemu_mutex_lock(&umemd.mutex);
+    DPRINTF("|= UMEM_STATE_EOC_SEND_REQ\n");
+    umemd.state |= UMEM_STATE_EOC_SEND_REQ;
+    qemu_mutex_unlock(&umemd.mutex);
+}
+
+static int postcopy_incoming_umem_send_page_req(UMemBlock *block)
+{
+    int error;
+    struct qemu_umem_req req;
+    unsigned long bit;
+    uint64_t target_pgoff;
+    int i;
+
+    umemd.page_request->nr = MAX_REQUESTS;
+    error = umem_get_page_request(block->umem, umemd.page_request);
+    if (error) {
+        return error;
+    }
+    DPRINTF("id %s nr %"PRId64" offs 0x%"PRIx64" 0x%"PRIx64"\n",
+            block->idstr, (uint64_t)umemd.page_request->nr,
+            (uint64_t)umemd.page_request->pgoffs[0],
+            (uint64_t)umemd.page_request->pgoffs[1]);
+
+    if (umemd.last_block_write != block) {
+        req.cmd = QEMU_UMEM_REQ_PAGE;
+        req.idstr = block->idstr;
+    } else {
+        req.cmd = QEMU_UMEM_REQ_PAGE_CONT;
+    }
+
+    req.nr = 0;
+    req.pgoffs = umemd.target_pgoffs;
+    if (TARGET_PAGE_SIZE >= umemd.host_page_size) {
+        for (i = 0; i < umemd.page_request->nr; i++) {
+            target_pgoff = umemd.page_request->pgoffs[i] >>
+                umemd.host_to_target_page_shift;
+            bit = (block->offset >> TARGET_PAGE_BITS) + target_pgoff;
+
+            if (!test_and_set_bit(bit, umemd.phys_requested)) {
+                req.pgoffs[req.nr] = target_pgoff;
+                req.nr++;
+            }
+        }
+    } else {
+        for (i = 0; i < umemd.page_request->nr; i++) {
+            int j;
+            target_pgoff = umemd.page_request->pgoffs[i] <<
+                umemd.host_to_target_page_shift;
+            bit = (block->offset >> TARGET_PAGE_BITS) + target_pgoff;
+
+            for (j = 0; j < umemd.nr_target_pages_per_host_page; j++) {
+                if (!test_and_set_bit(bit + j, umemd.phys_requested)) {
+                    req.pgoffs[req.nr] = target_pgoff + j;
+                    req.nr++;
+                }
+            }
+        }
+    }
+
+    DPRINTF("id %s nr %d offs 0x%"PRIx64" 0x%"PRIx64"\n",
+            block->idstr, req.nr, req.pgoffs[0], req.pgoffs[1]);
+    if (req.nr > 0 && umemd.mig_write != NULL) {
+        postcopy_incoming_send_req(umemd.mig_write, &req);
+        umemd.last_block_write = block;
+    }
+    return 0;
+}
+
+static void postcopy_incoming_umem_page_fault(UMemBlock *block,
+                                              const UMemPages *pages)
+{
+    uint64_t i;
+
+    for (i = 0; i < pages->nr; i++) {
+        size_t offset = pages->pgoffs[i] << umemd.host_page_shift;
+        RAMBlock *ram_block;
+
+        /* make pages present by forcibly triggering page fault. */
+        qemu_mutex_lock_ramlist();
+        ram_block = ram_find_block(block->idstr, strlen(block->idstr));
+        if (ram_block && offset < ram_block->length) {
+            volatile uint8_t *ram =
+                memory_region_get_ram_ptr(ram_block->mr) + offset;
+            uint8_t dummy_read = ram[0];
+            (void)dummy_read;   /* suppress unused variable warning */
+        }
+        qemu_mutex_unlock_ramlist();
+
+        umem_remove_shmem(block->umem, offset, umemd.host_page_size);
+    }
+}
+
+static bool postcopy_incoming_umem_check_umem_done(void)
+{
+    bool all_done = true;
+    UMemBlock *block;
+
+    QLIST_FOREACH(block, &umemd.blocks, next) {
+        if (umem_shmem_finished(block->umem)) {
+            umem_unmap_shmem(block->umem);
+        } else {
+            all_done = false;
+            break;
+        }
+    }
+
+    return all_done;
+}
+
+static void postcopy_incoming_umem_done(void)
+{
+    postcopy_incoming_umem_req_eoc();
+    qemu_mutex_lock(&umemd.mutex);
+    postcopy_incoming_umem_queue_quit_locked();
+    qemu_mutex_unlock(&umemd.mutex);
+}
+
+static UMemBlock *postcopy_incoming_umem_block_from_stream(
+    QEMUFile *f, int flags)
+{
+    uint8_t len;
+    char id[256];
+    UMemBlock *block;
+
+    if (flags & RAM_SAVE_FLAG_CONTINUE) {
+        return umemd.last_block_read;
+    }
+
+    len = qemu_get_byte(f);
+    qemu_get_buffer(f, (uint8_t*)id, len);
+    id[len] = 0;
+
+    DPRINTF("idstr: %s len %d\n", id, len);
+    QLIST_FOREACH(block, &umemd.blocks, next) {
+        if (!strncmp(id, block->idstr, len)) {
+            umemd.last_block_read = block;
+            return block;
+        }
+    }
+    DPRINTF("error\n");
+    return NULL;
+}
+
+static int postcopy_incoming_umem_ram_load(void)
+{
+    ram_addr_t offset;
+    int flags;
+    UMemBlock *block;
+
+    void *shmem;
+    int error;
+    int i;
+    int bit;
+
+    if (umemd.version_id != RAM_SAVE_VERSION_ID) {
+        return -EINVAL;
+    }
+
+    error = postcopy_incoming_ram_load_get64(umemd.mig_read, &offset, &flags);
+    /* DPRINTF("offset 0x%lx flags 0x%x\n", offset, flags); */
+    if (error) {
+        DPRINTF("error %d\n", error);
+        return error;
+    }
+    assert(!(flags & RAM_SAVE_FLAG_MEM_SIZE));
+
+    if (flags & RAM_SAVE_FLAG_EOS) {
+        DPRINTF("RAM_SAVE_FLAG_EOS\n");
+        postcopy_incoming_umem_req_eoc();
+
+        qemu_fclose(umemd.mig_read);
+        umemd.mig_read = NULL;
+        fd_close(&umemd.mig_read_fd);
+
+        qemu_mutex_lock(&umemd.mutex);
+        umemd.state |= UMEM_STATE_EOS_RECEIVED;
+        postcopy_incoming_umem_queue_quit_locked();
+        qemu_mutex_unlock(&umemd.mutex);
+        DPRINTF("|= UMEM_STATE_EOS_RECEIVED\n");
+        return 0;
+    }
+
+    block = postcopy_incoming_umem_block_from_stream(umemd.mig_read, flags);
+    if (block == NULL) {
+        return -EINVAL;
+    }
+    assert(!umem_shmem_finished(block->umem));
+    shmem = block->umem->shmem + offset;
+    error = ram_load_page(umemd.mig_read, shmem, flags);
+    if (error) {
+        DPRINTF("error %d\n", error);
+        return error;
+    }
+    error = qemu_file_get_error(umemd.mig_read);
+    if (error) {
+        DPRINTF("error %d\n", error);
+        return error;
+    }
+
+    umemd.page_cached->nr = 0;
+    bit = (block->offset + offset) >> TARGET_PAGE_BITS;
+    if (!test_and_set_bit(bit, umemd.phys_received)) {
+        if (TARGET_PAGE_SIZE >= umemd.host_page_size) {
+            uint64_t pgoff = offset >> umemd.host_page_shift;
+            for (i = 0; i < umemd.nr_host_pages_per_target_page; i++) {
+                umemd.page_cached->pgoffs[umemd.page_cached->nr] = pgoff + i;
+                umemd.page_cached->nr++;
+            }
+        } else {
+            bool mark_cache = true;
+            for (i = 0; i < umemd.nr_target_pages_per_host_page; i++) {
+                if (!test_bit(bit + i, umemd.phys_received)) {
+                    mark_cache = false;
+                    break;
+                }
+            }
+            if (mark_cache) {
+                umemd.page_cached->pgoffs[0] =
+                    offset >> umemd.host_page_shift;
+                umemd.page_cached->nr = 1;
+            }
+        }
+    }
+
+    if (umemd.page_cached->nr > 0) {
+        error = umem_mark_page_cached(block->umem, umemd.page_cached);
+        if (error) {
+            return error;
+        }
+        postcopy_incoming_umem_page_fault(block, umemd.page_cached);
+        if (postcopy_incoming_umem_check_umem_done()) {
+            postcopy_incoming_umem_done();
+        }
+    }
+
+    return 0;
+}
+
+static int postcopy_incoming_umemd_mig_read_loop(void)
+{
+    int error;
+    /* read thread doesn't need to check periodically UMEM_STATE_EOC_SEND_REQ
+     * because RAM_SAVE_FLAG_EOS is always sent by the outgoing part. */
+    if (umemd.mig_read_fd < 0) {
+        return -EINVAL;
+    }
+    error = postcopy_incoming_umem_ram_load();
+    if (error) {
+        postcopy_incoming_umem_error_req();
+    }
+    return error;
+}
+
+static int postcopy_incoming_umemd_mig_write_loop(void)
+{
+    int ret;
+    UMemBlock *block;
+    /* to check UMEM_STATE_EOC_SEND_REQ periodically */
+    struct timeval timeout = {.tv_sec = 1, .tv_usec = 0};
+    int nfds = -1;
+    fd_set readfds;
+    FD_ZERO(&readfds);
+
+    QLIST_FOREACH(block, &umemd.blocks, next) {
+        set_fd(block->umem->fd, &readfds, &nfds);
+    }
+    ret = select(nfds + 1, &readfds, NULL, NULL, &timeout);
+    if (ret == -1) {
+        if (errno == EINTR) {
+            return 0;
+        }
+        return ret;
+    }
+    QLIST_FOREACH(block, &umemd.blocks, next) {
+        if (FD_ISSET(block->umem->fd, &readfds)) {
+            ret = postcopy_incoming_umem_send_page_req(block);
+            if (ret) {
+                postcopy_incoming_umem_error_req();
+                return ret;
+            }
+        }
+    }
+    if (umemd.mig_write != NULL) {
+        qemu_fflush(umemd.mig_write);
+    }
+    postcopy_incoming_umem_check_eoc_req();
+
+    return 0;
+}
+
+static int postcopy_incoming_umemd_pipe_init(void)
+{
+    int error;
+    error = umem_daemon_ready(umemd.to_qemu_fd);
+    if (error) {
+        goto out;
+    }
+    umemd.to_qemu = qemu_fopen_fd(umemd.to_qemu_fd, "w");
+
+    /* wait for qemu to disown migration_fd */
+    error = umem_daemon_wait_for_qemu(umemd.from_qemu_fd);
+    if (error) {
+        goto out;
+    }
+    umemd.from_qemu = qemu_fopen_fd(umemd.from_qemu_fd, "r");
+    return 0;
+
+out:
+    /* Here there is no way to tell error to main thread
+       in order to teardown. */
+    perror("initialization error");
+    abort();
+    return error;
+}
+
+static int postcopy_incoming_umemd_pipe_loop(void)
+{
+    int ret;
+    /* to check UMEM_STATE_QUIT_QUEUED periodically */
+    struct timeval timeout = {.tv_sec = 1, .tv_usec = 0};
+    fd_set readfds;
+    int nfds = -1;
+
+    FD_ZERO(&readfds);
+    if (umemd.from_qemu_fd >= 0) {
+        set_fd(umemd.from_qemu_fd, &readfds, &nfds);
+    }
+    ret = select(nfds + 1, &readfds, NULL, NULL, &timeout);
+    if (ret == -1) {
+        if (errno == EINTR) {
+            return 0;
+        }
+        return ret;
+    }
+    if (umemd.from_qemu_fd >= 0 && FD_ISSET(umemd.from_qemu_fd, &readfds)) {
+        uint8_t cmd;
+        cmd = qemu_get_ubyte(umemd.from_qemu);
+        DPRINTF("cmd %c\n", cmd);
+        switch (cmd) {
+        case UMEM_QEMU_QUIT:
+            postcopy_incoming_umem_recv_quit();
+            postcopy_incoming_umem_done();
+            break;
+        default:
+            abort();
+            break;
+        }
+        if (umemd.to_qemu != NULL) {
+            qemu_fflush(umemd.to_qemu);
+        }
+    }
+
+    if (umemd.to_qemu != NULL) {
+        qemu_mutex_lock(&umemd.mutex);
+        if (umemd.state & UMEM_STATE_ERROR_REQ &&
+            !(umemd.state & UMEM_STATE_ERROR_SENDING)) {
+            umemd.state |= UMEM_STATE_ERROR_SENDING;
+            qemu_mutex_unlock(&umemd.mutex);
+            umem_daemon_error(umemd.to_qemu);
+            qemu_mutex_lock(&umemd.mutex);
+            umemd.state |= UMEM_STATE_ERROR_SENT;
+        }
+        if (umemd.state & UMEM_STATE_QUIT_QUEUED &&
+            !(umemd.state & (UMEM_STATE_QUIT_SENDING |
+                             UMEM_STATE_QUIT_SENT))) {
+            DPRINTF("|= UMEM_STATE_QUIT_SENDING\n");
+            umemd.state |= UMEM_STATE_QUIT_SENDING;
+            qemu_mutex_unlock(&umemd.mutex);
+
+            umem_daemon_quit(umemd.to_qemu);
+            qemu_fclose(umemd.to_qemu);
+            umemd.to_qemu = NULL;
+            fd_close(&umemd.to_qemu_fd);
+
+            qemu_mutex_lock(&umemd.mutex);
+            DPRINTF("|= UMEM_STATE_QUIT_SENT\n");
+            umemd.state |= UMEM_STATE_QUIT_SENT;
+        }
+        qemu_mutex_unlock(&umemd.mutex);
+    }
+
+    return 0;
+}
+
+struct IncomingThread {
+    int (*init_func)(void);
+    int (*loop_func)(void);
+};
+typedef struct IncomingThread IncomingThread;
+
+static void *postcopy_incoming_umemd_thread(void* arg)
+{
+    IncomingThread *im  = arg;
+    int error;
+
+    DPRINTF("loop %d %p %p\n", getpid(), im->init_func, im->loop_func);
+    if (im->init_func) {
+        error = im->init_func();
+        if (error) {
+            postcopy_incoming_umem_error_req();
+            return NULL;
+        }
+    }
+    for (;;) {
+        qemu_mutex_lock(&umemd.mutex);
+        if ((umemd.state & UMEM_STATE_END_MASK) == UMEM_STATE_END_MASK) {
+            qemu_mutex_unlock(&umemd.mutex);
+            DPRINTF("loop out %p\n", im->loop_func);
+            break;
+        }
+        qemu_mutex_unlock(&umemd.mutex);
+
+        error = im->loop_func();
+        if (error) {
+            DPRINTF("func %p error = %d\n", im->loop_func, error);
+            break;
+        }
+    }
+    return NULL;
+}
+
+static void *postcopy_incoming_umemd(void* unused)
+{
+    DPRINTF("umemd\n");
+    qemu_thread_create(&umemd.mig_read_thread,
+                       &postcopy_incoming_umemd_thread,
+                       &(IncomingThread) {
+                           NULL, &postcopy_incoming_umemd_mig_read_loop,},
+                       QEMU_THREAD_JOINABLE);
+    qemu_thread_create(&umemd.mig_write_thread,
+                       &postcopy_incoming_umemd_thread,
+                       &(IncomingThread) {
+                           NULL, &postcopy_incoming_umemd_mig_write_loop,},
+                       QEMU_THREAD_JOINABLE);
+    qemu_thread_create(&umemd.pipe_thread, &postcopy_incoming_umemd_thread,
+                       &(IncomingThread) {
+                           &postcopy_incoming_umemd_pipe_init,
+                           &postcopy_incoming_umemd_pipe_loop,},
+                       QEMU_THREAD_JOINABLE);
+
+    qemu_thread_join(&umemd.mig_read_thread);
+    qemu_thread_join(&umemd.mig_write_thread);
+    qemu_thread_join(&umemd.pipe_thread);
+
+    g_free(umemd.page_request);
+    g_free(umemd.page_cached);
+    g_free(umemd.target_pgoffs);
+    g_free(umemd.phys_requested);
+    g_free(umemd.phys_received);
+
+    postcopy_incoming_umem_block_free();
+
+    DPRINTF("umemd done\n");
+    return NULL;
+}
diff --git a/migration-tcp.c b/migration-tcp.c
index a15c2b8..69c655d 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -107,13 +107,15 @@ static void tcp_accept_incoming_migration(void *opaque)
     f = qemu_fopen_socket(c);
     if (f == NULL) {
         fprintf(stderr, "could not qemu_fopen socket\n");
-        goto out;
+        close(c);
+        goto out2;
     }
 
     process_incoming_migration(f);
-    qemu_fclose(f);
-out:
-    close(c);
+    if (!incoming_postcopy) {
+        qemu_fclose(f);
+        close(c);
+    }
 out2:
     qemu_set_fd_handler2(s, NULL, NULL, NULL, NULL);
     close(s);
diff --git a/migration-unix.c b/migration-unix.c
index 169de88..d4e2431 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -140,13 +140,15 @@ static void unix_accept_incoming_migration(void *opaque)
     f = qemu_fopen_socket(c);
     if (f == NULL) {
         fprintf(stderr, "could not qemu_fopen socket\n");
-        goto out;
+        close(c);
+        goto out2;
     }
 
     process_incoming_migration(f);
-    qemu_fclose(f);
-out:
-    close(c);
+    if (!incoming_postcopy) {
+        qemu_fclose(f);
+        close(c);
+    }
 out2:
     qemu_set_fd_handler2(s, NULL, NULL, NULL, NULL);
     close(s);
diff --git a/migration.h b/migration.h
index 2d27738..0766691 100644
--- a/migration.h
+++ b/migration.h
@@ -134,4 +134,14 @@ int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
+/* For incoming postcopy */
+extern bool incoming_postcopy;
+
+int postcopy_incoming_loadvm_state(QEMUFile *f, QEMUFile **buf_file);
+int postcopy_incoming_ram_load(QEMUFile *f, void *opaque, int version_id);
+void postcopy_incoming_qemu_cleanup(void);
+#if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
+void postcopy_incoming_ram_free(RAMBlock *ram_block);
+#endif
+
 #endif
diff --git a/savevm.c b/savevm.c
index c93b6eb..d1488d2 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1981,6 +1981,10 @@ int qemu_loadvm_state(QEMUFile *f)
     uint8_t section_type;
     unsigned int v;
     int ret;
+    QEMUFile *orig_f = NULL;
+
+    /* postcopy may change this. restore later */
+    LoadStateHandler *old_ram_load = savevm_ram_handlers.load_state;
 
     if (qemu_savevm_state_blocked(NULL)) {
         return -EINVAL;
@@ -2048,6 +2052,7 @@ int qemu_loadvm_state(QEMUFile *f)
             break;
         case QEMU_VM_SECTION_PART:
         case QEMU_VM_SECTION_END:
+            assert(orig_f == NULL);
             section_id = qemu_get_be32(f);
 
             QLIST_FOREACH(le, &loadvm_handlers, entry) {
@@ -2068,6 +2073,23 @@ int qemu_loadvm_state(QEMUFile *f)
                 goto out;
             }
             break;
+        case QEMU_VM_POSTCOPY: {
+            QEMUFile *buf_file = NULL;
+            ret = postcopy_incoming_loadvm_state(f, &buf_file);
+            if (ret) {
+                goto out;
+            }
+            if (buf_file != NULL) {
+                /* VMStateDescription:pre/post_load and
+                 * cpu_sychronize_all_post_init() may fault on guest RAM.
+                 * (MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME)
+                 * postcopy threads needs to be created before the fault.
+                 */
+                orig_f = f;
+                f = buf_file;
+            }
+            break;
+        }
         default:
             fprintf(stderr, "Unknown savevm section type %d\n", section_type);
             ret = -EINVAL;
@@ -2080,6 +2102,12 @@ int qemu_loadvm_state(QEMUFile *f)
     ret = 0;
 
 out:
+    if (orig_f != NULL) {
+        assert(incoming_postcopy);
+        qemu_fclose(f);
+        f = orig_f;
+    }
+    savevm_ram_handlers.load_state = old_ram_load;
     QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
         QLIST_REMOVE(le, entry);
         g_free(le);
diff --git a/vl.c b/vl.c
index 723fc59..3221f50 100644
--- a/vl.c
+++ b/vl.c
@@ -3789,6 +3789,8 @@ int main(int argc, char **argv, char **envp)
     bdrv_close_all();
     pause_all_vcpus();
     net_cleanup();
+    postcopy_incoming_qemu_cleanup();
+
     res_free();
 
     return 0;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 24/35] postcopy outgoing: add -p option to migrate command
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (22 preceding siblings ...)
  2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 23/35] postcopy: implement incoming part of postcopy live migration Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-11-01 19:48   ` Eric Blake
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 25/35] postcopy: implement outgoing part of postcopy live migration Isaku Yamahata
                   ` (13 subsequent siblings)
  37 siblings, 1 reply; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Added -p option to migrate command for postcopy mode and
introduce postcopy parameter for migration to indicate that postcopy mode
is enabled.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Chnages v1 -> v2:
- catch up for qapi change
---
 hmp-commands.hx  |   10 ++++++----
 hmp.c            |    4 +++-
 migration.c      |    3 ++-
 migration.h      |    1 +
 qapi-schema.json |    3 ++-
 qmp-commands.hx  |    3 ++-
 savevm.c         |    3 ++-
 7 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index e0b537d..f2f1264 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,23 +826,25 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s",
+        .params     = "[-d] [-b] [-i] [-p] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+		      "(base image shared between src and destination)"
+		      "\n\t\t\t-p for migration with postcopy mode enabled",
         .mhandler.cmd = hmp_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-p] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
+	-p for migration with postcopy mode enabled
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index 2b97982..2ea3bc4 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1035,10 +1035,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int detach = qdict_get_try_bool(qdict, "detach", 0);
     int blk = qdict_get_try_bool(qdict, "blk", 0);
     int inc = qdict_get_try_bool(qdict, "inc", 0);
+    int postcopy = qdict_get_try_bool(qdict, "postcopy", 0);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
+                !!postcopy, postcopy, &err);
     if (err) {
         monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
         error_free(err);
diff --git a/migration.c b/migration.c
index 00b0bc2..8bb6073 100644
--- a/migration.c
+++ b/migration.c
@@ -480,7 +480,7 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
-                 Error **errp)
+                 bool has_postcopy, bool postcopy, Error **errp)
 {
     MigrationState *s = migrate_get_current();
     MigrationParams params;
@@ -489,6 +489,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
     params.blk = blk;
     params.shared = inc;
+    params.postcopy = postcopy;
 
     if (s->state == MIG_STATE_ACTIVE) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 0766691..b21df18 100644
--- a/migration.h
+++ b/migration.h
@@ -24,6 +24,7 @@
 struct MigrationParams {
     bool blk;
     bool shared;
+    bool postcopy;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index c615ee2..c969e5a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2094,7 +2094,8 @@
 # Since: 0.14.0
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
+           '*postcopy': 'bool'} }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 5ba8c48..ece7a7e 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -518,7 +518,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s",
         .mhandler.cmd_new = qmp_marshal_input_migrate,
     },
 
@@ -532,6 +532,7 @@ Arguments:
 
 - "blk": block migration, full disk copy (json-bool, optional)
 - "inc": incremental disk copy (json-bool, optional)
+- "postcopy": postcopy migration (json-bool, optional)
 - "uri": Destination URI (json-string)
 
 Example:
diff --git a/savevm.c b/savevm.c
index d1488d2..04b03cf 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1806,7 +1806,8 @@ static int qemu_savevm_state(QEMUFile *f)
     int ret;
     MigrationParams params = {
         .blk = 0,
-        .shared = 0
+        .shared = 0,
+        .postcopy = 0,
     };
 
     if (qemu_savevm_state_blocked(NULL)) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 25/35] postcopy: implement outgoing part of postcopy live migration
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (23 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 24/35] postcopy outgoing: add -p option to migrate command Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 26/35] postcopy/outgoing: add -n options to disable background transfer Isaku Yamahata
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This patch implements postcopy live migration for outgoing part

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- modify savevm_ram_handlers instead of if (postcopy)
- code simplification

Changes v1 -> v2:
- fix parameter to qemu_fdopen()
- handle QEMU_UMEM_REQ_EOC properly
  when PO_STATE_ALL_PAGES_SENT, QEMU_UMEM_REQ_EOC request was ignored.
  handle properly it.
- flush on-demand page unconditionally
- improve postcopy_outgoing_ram_save_live and postcopy_outgoing_begin()
- use qemu_fopen_fd
- use memory api instead of obsolete api
- segv in postcopy_outgoing_check_all_ram_sent()
- catch up qapi change
---
 arch_init.c          |   22 ++-
 migration-exec.c     |    4 +
 migration-fd.c       |   17 ++
 migration-postcopy.c |  423 ++++++++++++++++++++++++++++++++++++++++++++++++++
 migration-tcp.c      |    6 +-
 migration-unix.c     |   26 +++-
 migration.c          |   32 +++-
 migration.h          |   18 +++
 savevm.c             |   35 ++++-
 sysemu.h             |    2 +-
 10 files changed, 572 insertions(+), 13 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d82316d..d95ce7b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -189,7 +189,6 @@ static struct {
     .cache = NULL,
 };
 
-
 int64_t xbzrle_cache_resize(int64_t new_size)
 {
     if (XBZRLE.cache != NULL) {
@@ -591,6 +590,7 @@ static void reset_ram_globals(void)
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
+    const MigrationParams *params = &migrate_get_current()->params;
     migration_bitmap_init();
 
     qemu_mutex_lock_ramlist();
@@ -610,8 +610,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
         acct_clear();
     }
 
-    memory_global_dirty_log_start();
-    migration_bitmap_sync();
+    if (!params->postcopy) {
+        memory_global_dirty_log_start();
+        migration_bitmap_sync();
+    }
 
     qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
@@ -916,7 +918,21 @@ done:
     return ret;
 }
 
+static void ram_save_set_params(const MigrationParams *params, void *opaque)
+{
+    if (params->postcopy) {
+        savevm_ram_handlers.save_live_iterate =
+            postcopy_outgoing_ram_save_iterate;
+        savevm_ram_handlers.save_live_complete =
+            postcopy_outgoing_ram_save_complete;
+    } else {
+        savevm_ram_handlers.save_live_iterate = ram_save_iterate;
+        savevm_ram_handlers.save_live_complete = ram_save_complete;
+    }
+}
+
 SaveVMHandlers savevm_ram_handlers = {
+    .set_params = ram_save_set_params,
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
     .save_live_complete = ram_save_complete,
diff --git a/migration-exec.c b/migration-exec.c
index 95e9779..10bbecf 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -64,6 +64,10 @@ int exec_start_outgoing_migration(MigrationState *s, const char *command)
 {
     FILE *f;
 
+    if (s->params.postcopy) {
+        return -ENOSYS;
+    }
+
     f = popen(command, "w");
     if (f == NULL) {
         DPRINTF("Unable to popen exec target\n");
diff --git a/migration-fd.c b/migration-fd.c
index 8384975..f68fa28 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -90,6 +90,23 @@ int fd_start_outgoing_migration(MigrationState *s, const char *fdname)
     s->write = fd_write;
     s->close = fd_close;
 
+    if (s->params.postcopy) {
+        int flags = fcntl(s->fd, F_GETFL);
+        if ((flags & O_ACCMODE) != O_RDWR) {
+            goto err_after_open;
+        }
+
+        s->fd_read = dup(s->fd);
+        if (s->fd_read == -1) {
+            goto err_after_open;
+        }
+        s->file_read = qemu_fopen_fd(s->fd_read, "rb");
+        if (s->file_read == NULL) {
+            close(s->fd_read);
+            goto err_after_open;
+        }
+    }
+
     migrate_fd_connect(s);
     return 0;
 
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 0809ffa..399e233 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -167,6 +167,107 @@ static void postcopy_incoming_send_req(QEMUFile *f,
     }
 }
 
+static int postcopy_outgoing_recv_req_idstr(QEMUFile *f,
+                                            struct qemu_umem_req *req,
+                                            size_t *offset)
+{
+    int ret;
+
+    req->len = qemu_peek_byte(f, *offset);
+    *offset += 1;
+    if (req->len == 0) {
+        return -EAGAIN;
+    }
+    req->idstr = g_malloc((int)req->len + 1);
+    ret = qemu_peek_buffer(f, (uint8_t*)req->idstr, req->len, *offset);
+    *offset += ret;
+    if (ret != req->len) {
+        g_free(req->idstr);
+        req->idstr = NULL;
+        return -EAGAIN;
+    }
+    req->idstr[req->len] = 0;
+    return 0;
+}
+
+static int postcopy_outgoing_recv_req_pgoffs(QEMUFile *f,
+                                             struct qemu_umem_req *req,
+                                             size_t *offset)
+{
+    int ret;
+    uint32_t be32;
+    uint32_t i;
+
+    ret = qemu_peek_buffer(f, (uint8_t*)&be32, sizeof(be32), *offset);
+    *offset += sizeof(be32);
+    if (ret != sizeof(be32)) {
+        return -EAGAIN;
+    }
+
+    req->nr = be32_to_cpu(be32);
+    req->pgoffs = g_new(uint64_t, req->nr);
+    for (i = 0; i < req->nr; i++) {
+        uint64_t be64;
+        ret = qemu_peek_buffer(f, (uint8_t*)&be64, sizeof(be64), *offset);
+        *offset += sizeof(be64);
+        if (ret != sizeof(be64)) {
+            g_free(req->pgoffs);
+            req->pgoffs = NULL;
+            return -EAGAIN;
+        }
+        req->pgoffs[i] = be64_to_cpu(be64);
+    }
+    return 0;
+}
+
+static int postcopy_outgoing_recv_req(QEMUFile *f, struct qemu_umem_req *req)
+{
+    int size;
+    int ret;
+    size_t offset = 0;
+
+    size = qemu_peek_buffer(f, (uint8_t*)&req->cmd, 1, offset);
+    if (size <= 0) {
+        return -EAGAIN;
+    }
+    offset += 1;
+
+    switch (req->cmd) {
+    case QEMU_UMEM_REQ_INIT:
+    case QEMU_UMEM_REQ_EOC:
+        /* nothing */
+        break;
+    case QEMU_UMEM_REQ_PAGE:
+        ret = postcopy_outgoing_recv_req_idstr(f, req, &offset);
+        if (ret < 0) {
+            return ret;
+        }
+        ret = postcopy_outgoing_recv_req_pgoffs(f, req, &offset);
+        if (ret < 0) {
+            return ret;
+        }
+        break;
+    case QEMU_UMEM_REQ_PAGE_CONT:
+        ret = postcopy_outgoing_recv_req_pgoffs(f, req, &offset);
+        if (ret < 0) {
+            return ret;
+        }
+        break;
+    default:
+        abort();
+        break;
+    }
+    qemu_file_skip(f, offset);
+    DPRINTF("cmd %d\n", req->cmd);
+    return 0;
+}
+
+static void postcopy_outgoing_free_req(struct qemu_umem_req *req)
+{
+    g_free(req->idstr);
+    g_free(req->pgoffs);
+}
+
 /***************************************************************************
  * QEMU_VM_POSTCOPY section subtype
  */
@@ -174,6 +275,328 @@ static void postcopy_incoming_send_req(QEMUFile *f,
 #define QEMU_VM_POSTCOPY_SECTION_FULL   1
 
 /***************************************************************************
+ * outgoing part
+ */
+
+enum POState {
+    PO_STATE_ERROR_RECEIVE,
+    PO_STATE_ACTIVE,
+    PO_STATE_EOC_RECEIVED,
+    PO_STATE_ALL_PAGES_SENT,
+    PO_STATE_COMPLETED,
+};
+typedef enum POState POState;
+
+struct PostcopyOutgoingState {
+    POState state;
+    QEMUFile *mig_read;
+    int fd_read;
+    RAMBlock *last_block_read;
+
+    QEMUFile *mig_buffered_write;
+    MigrationState *ms;
+};
+
+int postcopy_outgoing_create_read_socket(MigrationState *s)
+{
+    if (!s->params.postcopy) {
+        return 0;
+    }
+
+    s->fd_read = dup(s->fd);
+    if (s->fd_read == -1) {
+        int ret = -errno;
+        perror("dup");
+        return ret;
+    }
+    s->file_read = qemu_fopen_socket(s->fd_read);
+    if (s->file_read == NULL) {
+        return -EINVAL;
+    }
+    return 0;
+}
+
+void postcopy_outgoing_state_begin(QEMUFile *f)
+{
+    uint64_t options = 0;
+    qemu_put_ubyte(f, QEMU_VM_POSTCOPY_INIT);
+    qemu_put_be32(f, sizeof(options));
+    qemu_put_be64(f, options);
+}
+
+void postcopy_outgoing_state_complete(
+    QEMUFile *f, const uint8_t *buffer, size_t buffer_size)
+{
+    qemu_put_ubyte(f, QEMU_VM_POSTCOPY_SECTION_FULL);
+    qemu_put_be32(f, buffer_size);
+    qemu_put_buffer(f, buffer, buffer_size);
+}
+
+int postcopy_outgoing_ram_save_iterate(QEMUFile *f, void *opaque)
+{
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    return 1;
+}
+
+int postcopy_outgoing_ram_save_complete(QEMUFile *f, void *opaque)
+{
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    return 0;
+}
+
+/*
+ * return value
+ *   0: continue postcopy mode
+ * > 0: completed postcopy mode.
+ * < 0: error
+ */
+static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
+                                        const struct qemu_umem_req *req,
+                                        bool *written)
+{
+    int i;
+    RAMBlock *block;
+
+    DPRINTF("cmd %d state %d\n", req->cmd, s->state);
+    switch(req->cmd) {
+    case QEMU_UMEM_REQ_INIT:
+        /* nothing */
+        break;
+    case QEMU_UMEM_REQ_EOC:
+        /* tell to finish migration. */
+        if (s->state == PO_STATE_ALL_PAGES_SENT) {
+            s->state = PO_STATE_COMPLETED;
+            DPRINTF("-> PO_STATE_COMPLETED\n");
+        } else {
+            s->state = PO_STATE_EOC_RECEIVED;
+            DPRINTF("-> PO_STATE_EOC_RECEIVED\n");
+        }
+        return 1;
+    case QEMU_UMEM_REQ_PAGE:
+        DPRINTF("idstr: %s\n", req->idstr);
+        block = ram_find_block(req->idstr, strlen(req->idstr));
+        if (block == NULL) {
+            return -EINVAL;
+        }
+        s->last_block_read = block;
+        /* fall through */
+    case QEMU_UMEM_REQ_PAGE_CONT:
+        DPRINTF("nr %d\n", req->nr);
+        if (s->mig_buffered_write == NULL) {
+            assert(s->state == PO_STATE_ALL_PAGES_SENT);
+            break;
+        }
+        for (i = 0; i < req->nr; i++) {
+            DPRINTF("offs[%d] 0x%"PRIx64"\n", i, req->pgoffs[i]);
+            int ret = ram_save_page(s->mig_buffered_write, s->last_block_read,
+                                    req->pgoffs[i] << TARGET_PAGE_BITS, false);
+            if (ret > 0) {
+                *written = true;
+            }
+        }
+        break;
+    default:
+        return -EINVAL;
+    }
+    return 0;
+}
+
+static void postcopy_outgoing_close_mig_read(PostcopyOutgoingState *s)
+{
+    if (s->mig_read != NULL) {
+        qemu_set_fd_handler(s->fd_read, NULL, NULL, NULL);
+        qemu_fclose(s->mig_read);
+        s->mig_read = NULL;
+        fd_close(&s->fd_read);
+
+        s->ms->file_read = NULL;
+        s->ms->fd_read = -1;
+    }
+}
+
+static void postcopy_outgoing_completed(PostcopyOutgoingState *s)
+{
+    postcopy_outgoing_close_mig_read(s);
+    s->ms->postcopy = NULL;
+    g_free(s);
+}
+
+static void postcopy_outgoing_recv_handler(void *opaque)
+{
+    PostcopyOutgoingState *s = opaque;
+    bool written = false;
+    int ret = 0;
+
+    assert(s->state == PO_STATE_ACTIVE ||
+           s->state == PO_STATE_ALL_PAGES_SENT);
+
+    do {
+        struct qemu_umem_req req = {.idstr = NULL,
+                                    .pgoffs = NULL};
+
+        ret = postcopy_outgoing_recv_req(s->mig_read, &req);
+        if (ret < 0) {
+            if (ret == -EAGAIN) {
+                ret = 0;
+            }
+            break;
+        }
+
+        /* Even when s->state == PO_STATE_ALL_PAGES_SENT,
+           some request can be received like QEMU_UMEM_REQ_EOC */
+        ret = postcopy_outgoing_handle_req(s, &req, &written);
+        postcopy_outgoing_free_req(&req);
+    } while (ret == 0);
+
+    /*
+     * flush buffered_file.
+     * Although mig_write is rate-limited buffered file, those written pages
+     * are requested on demand by the destination. So forcibly push
+     * those pages ignoring rate limiting
+     */
+    if (written) {
+        qemu_buffered_file_drain(s->mig_buffered_write);
+    }
+
+    if (ret < 0) {
+        switch (s->state) {
+        case PO_STATE_ACTIVE:
+            s->state = PO_STATE_ERROR_RECEIVE;
+            DPRINTF("-> PO_STATE_ERROR_RECEIVE\n");
+            break;
+        case PO_STATE_ALL_PAGES_SENT:
+            s->state = PO_STATE_COMPLETED;
+            DPRINTF("-> PO_STATE_ALL_PAGES_SENT\n");
+            break;
+        default:
+            abort();
+        }
+    }
+    if (s->state == PO_STATE_ERROR_RECEIVE || s->state == PO_STATE_COMPLETED) {
+        postcopy_outgoing_close_mig_read(s);
+    }
+    if (s->state == PO_STATE_COMPLETED) {
+        DPRINTF("PO_STATE_COMPLETED\n");
+        MigrationState *ms = s->ms;
+        postcopy_outgoing_completed(s);
+        migrate_fd_completed(ms);
+    }
+}
+
+PostcopyOutgoingState *postcopy_outgoing_begin(MigrationState *ms)
+{
+    PostcopyOutgoingState *s = g_new(PostcopyOutgoingState, 1);
+    DPRINTF("outgoing begin\n");
+    qemu_buffered_file_drain(ms->file);
+
+    s->ms = ms;
+    s->state = PO_STATE_ACTIVE;
+    s->fd_read = ms->fd_read;
+    s->mig_read = ms->file_read;
+    s->mig_buffered_write = ms->file;
+
+    /* Make sure all dirty bits are set */
+    memory_global_dirty_log_stop();
+    migration_bitmap_init();
+
+    qemu_set_fd_handler(s->fd_read,
+                        &postcopy_outgoing_recv_handler, NULL, s);
+    postcopy_outgoing_recv_handler(s);
+    return s;
+}
+
+static void postcopy_outgoing_ram_all_sent(QEMUFile *f,
+                                           PostcopyOutgoingState *s)
+{
+    assert(s->state == PO_STATE_ACTIVE);
+
+    s->state = PO_STATE_ALL_PAGES_SENT;
+    /* tell incoming side that all pages are sent */
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    qemu_buffered_file_drain(f);
+    DPRINTF("sent RAM_SAVE_FLAG_EOS\n");
+    migrate_fd_cleanup(s->ms);
+
+    /* Later migrate_fd_complete() will be called which calls
+     * migrate_fd_cleanup() again. So dummy file is created
+     * for qemu monitor to keep working.
+     */
+    s->ms->file = qemu_fopen_ops(NULL, NULL, NULL, NULL, NULL,
+                                 NULL, NULL);
+    s->mig_buffered_write = NULL;
+
+    migration_bitmap_free();
+}
+
+int postcopy_outgoing_ram_save_background(QEMUFile *f, void *postcopy)
+{
+    PostcopyOutgoingState *s = postcopy;
+#define MAX_WAIT        50      /* stolen from ram_save_iterate() */
+    double t0;
+    int i;
+
+    assert(s->state == PO_STATE_ACTIVE ||
+           s->state == PO_STATE_EOC_RECEIVED ||
+           s->state == PO_STATE_ERROR_RECEIVE);
+
+    switch (s->state) {
+    case PO_STATE_ACTIVE:
+        /* nothing. processed below */
+        break;
+    case PO_STATE_EOC_RECEIVED:
+        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+        s->state = PO_STATE_COMPLETED;
+        postcopy_outgoing_completed(s);
+        DPRINTF("PO_STATE_COMPLETED\n");
+        return 1;
+    case PO_STATE_ERROR_RECEIVE:
+        postcopy_outgoing_completed(s);
+        DPRINTF("PO_STATE_ERROR_RECEIVE\n");
+        return -1;
+    default:
+        abort();
+    }
+
+    DPRINTF("outgoing background state: %d\n", s->state);
+    i = 0;
+    t0 = qemu_get_clock_ns(rt_clock);
+    while (qemu_file_rate_limit(f) == 0) {
+        int nfds = -1;
+        fd_set readfds;
+        struct timeval timeout = {.tv_sec = 0, .tv_usec = 0};
+        int ret;
+
+        if (ram_save_block(f, false) == 0) { /* no more blocks */
+            DPRINTF("outgoing background all sent\n");
+            assert(s->state == PO_STATE_ACTIVE);
+            postcopy_outgoing_ram_all_sent(f, s);
+            return 0;
+        }
+
+        FD_ZERO(&readfds);
+        set_fd(s->fd_read, &readfds, &nfds);
+        ret = select(nfds + 1, &readfds, NULL, NULL, &timeout);
+        if (ret >= 0 && FD_ISSET(s->fd_read, &readfds)) {
+            /* page request is pending */
+            DPRINTF("pending request\n");
+            break;
+        }
+
+        /* stolen from ram_save_iterate() */
+        if ((i & 63) == 0) {
+            int64_t t1 = (qemu_get_clock_ns(rt_clock) - t0) / 1000000;
+            if (t1 > MAX_WAIT) {
+                DPRINTF("too long %"PRIu64"\n", t1);
+                break;
+            }
+        }
+        i++;
+    }
+
+    return 0;
+}
+
+/***************************************************************************
  * incoming part
  */
 
diff --git a/migration-tcp.c b/migration-tcp.c
index 69c655d..506246e 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -64,7 +64,11 @@ static void tcp_wait_for_connect(int fd, void *opaque)
     } else {
         DPRINTF("migrate connect success\n");
         s->fd = fd;
-        migrate_fd_connect(s);
+        if (postcopy_outgoing_create_read_socket(s) < 0) {
+            migrate_fd_error(s);
+        } else {
+            migrate_fd_connect(s);
+        }
     }
 }
 
diff --git a/migration-unix.c b/migration-unix.c
index d4e2431..7fc4906 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -71,12 +71,20 @@ static void unix_wait_for_connect(void *opaque)
 
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
 
-    if (val == 0)
+    if (val == 0) {
+        ret = postcopy_outgoing_create_read_socket(s);
+        if (ret < 0) {
+            goto error_out;
+        }
         migrate_fd_connect(s);
-    else {
+    } else {
         DPRINTF("error connecting %d\n", val);
-        migrate_fd_error(s);
+        goto error_out;
     }
+    return;
+
+error_out:
+    migrate_fd_error(s);
 }
 
 int unix_start_outgoing_migration(MigrationState *s, const char *path)
@@ -111,11 +119,19 @@ int unix_start_outgoing_migration(MigrationState *s, const char *path)
 
     if (ret < 0) {
         DPRINTF("connect failed\n");
-        migrate_fd_error(s);
-        return ret;
+        goto error_out;
+    }
+
+    ret = postcopy_outgoing_create_read_socket(s);
+    if (ret < 0) {
+        goto error_out;
     }
     migrate_fd_connect(s);
     return 0;
+
+error_out:
+    migrate_fd_error(s);
+    return ret;
 }
 
 static void unix_accept_incoming_migration(void *opaque)
diff --git a/migration.c b/migration.c
index 8bb6073..85f8f71 100644
--- a/migration.c
+++ b/migration.c
@@ -41,6 +41,11 @@ enum {
     MIG_STATE_COMPLETED,
 };
 
+enum {
+    MIG_SUBSTATE_PRECOPY,
+    MIG_SUBSTATE_POSTCOPY,
+};
+
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
 /* Migration XBZRLE default cache size */
@@ -328,6 +333,17 @@ void migrate_fd_put_ready(MigrationState *s)
         return;
     }
 
+    if (s->substate == MIG_SUBSTATE_POSTCOPY) {
+        /* PRINTF("postcopy background\n"); */
+        ret = postcopy_outgoing_ram_save_background(s->file, s->postcopy);
+        if (ret > 0) {
+            migrate_fd_completed(s);
+        } else if (ret < 0) {
+            migrate_fd_error(s);
+        }
+        return;
+    }
+
     DPRINTF("iterate\n");
     ret = qemu_savevm_state_iterate(s->file);
     if (ret < 0) {
@@ -341,7 +357,20 @@ void migrate_fd_put_ready(MigrationState *s)
         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
         vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 
-        if (qemu_savevm_state_complete(s->file) < 0) {
+        if (s->params.postcopy) {
+            if (qemu_savevm_state_complete(s->file, &s->params) < 0) {
+                migrate_fd_error(s);
+                if (old_vm_running) {
+                    vm_start();
+                }
+                return;
+            }
+            s->substate = MIG_SUBSTATE_POSTCOPY;
+            s->postcopy = postcopy_outgoing_begin(s);
+            return;
+        }
+
+        if (qemu_savevm_state_complete(s->file, &s->params) < 0) {
             migrate_fd_error(s);
         } else {
             migrate_fd_completed(s);
@@ -431,6 +460,7 @@ void migrate_fd_connect(MigrationState *s)
     int ret;
 
     s->state = MIG_STATE_ACTIVE;
+    s->substate = MIG_SUBSTATE_PRECOPY;
     s->file = qemu_fopen_ops_buffered(s);
 
     DPRINTF("beginning savevm\n");
diff --git a/migration.h b/migration.h
index b21df18..9b3c03b 100644
--- a/migration.h
+++ b/migration.h
@@ -28,6 +28,7 @@ struct MigrationParams {
 };
 
 typedef struct MigrationState MigrationState;
+typedef struct PostcopyOutgoingState PostcopyOutgoingState;
 
 struct MigrationState
 {
@@ -46,6 +47,12 @@ struct MigrationState
     int64_t dirty_pages_rate;
     bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
     int64_t xbzrle_cache_size;
+
+    /* for postcopy */
+    int substate;              /* precopy or postcopy */
+    int fd_read;
+    QEMUFile *file_read;        /* connection from the detination */
+    PostcopyOutgoingState *postcopy;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -135,6 +142,17 @@ int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
+/* For outgoing postcopy */
+int postcopy_outgoing_create_read_socket(MigrationState *s);
+void postcopy_outgoing_state_begin(QEMUFile *f);
+void postcopy_outgoing_state_complete(
+    QEMUFile *f, const uint8_t *buffer, size_t buffer_size);
+int postcopy_outgoing_ram_save_iterate(QEMUFile *f, void *opaque);
+int postcopy_outgoing_ram_save_complete(QEMUFile *f, void *opaque);
+
+PostcopyOutgoingState *postcopy_outgoing_begin(MigrationState *s);
+int postcopy_outgoing_ram_save_background(QEMUFile *f, void *postcopy);
+
 /* For incoming postcopy */
 extern bool incoming_postcopy;
 
diff --git a/savevm.c b/savevm.c
index 04b03cf..675f9a5 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1647,6 +1647,12 @@ int qemu_savevm_state_begin(QEMUFile *f,
     qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
     qemu_put_be32(f, QEMU_VM_FILE_VERSION);
 
+    if (params->postcopy) {
+        /* tell this is postcopy */
+        qemu_put_byte(f, QEMU_VM_POSTCOPY);
+        postcopy_outgoing_state_begin(f);
+    }
+
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         int len;
 
@@ -1734,8 +1740,10 @@ int qemu_savevm_state_iterate(QEMUFile *f)
     return ret;
 }
 
-int qemu_savevm_state_complete(QEMUFile *f)
+int qemu_savevm_state_complete(QEMUFile *f, const MigrationParams *params)
 {
+    QEMUFile *orig_f = NULL;
+    QEMUFileBuf *buf_file = NULL;
     SaveStateEntry *se;
     int ret;
 
@@ -1762,6 +1770,20 @@ int qemu_savevm_state_complete(QEMUFile *f)
         }
     }
 
+    if (params->postcopy) {
+        /* VMStateDescription:pre/post_load and
+         * cpu_sychronize_all_post_init() may fault on guest RAM.
+         * (MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME)
+         * postcopy threads needs to be created before the fault.
+         *
+         * This is hacky, but it's because size of section/state structure
+         * can't be easily determined without actual loading.
+         */
+        orig_f = f;
+        buf_file = qemu_fopen_buf_write();
+        f = buf_file->file;
+    }
+
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         int len;
 
@@ -1787,6 +1809,15 @@ int qemu_savevm_state_complete(QEMUFile *f)
 
     qemu_put_byte(f, QEMU_VM_EOF);
 
+    if (params->postcopy) {
+        qemu_fflush(f);
+        qemu_put_byte(orig_f, QEMU_VM_POSTCOPY);
+        postcopy_outgoing_state_complete(
+            orig_f, buf_file->buffer, buf_file->buffer_size);
+        qemu_fclose(f);
+        f = orig_f;
+    }
+
     return qemu_file_get_error(f);
 }
 
@@ -1825,7 +1856,7 @@ static int qemu_savevm_state(QEMUFile *f)
             goto out;
     } while (ret == 0);
 
-    ret = qemu_savevm_state_complete(f);
+    ret = qemu_savevm_state_complete(f, &params);
 
 out:
     if (ret == 0) {
diff --git a/sysemu.h b/sysemu.h
index 0c39a3a..f1129e7 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -81,7 +81,7 @@ bool qemu_savevm_state_blocked(Error **errp);
 int qemu_savevm_state_begin(QEMUFile *f,
                             const MigrationParams *params);
 int qemu_savevm_state_iterate(QEMUFile *f);
-int qemu_savevm_state_complete(QEMUFile *f);
+int qemu_savevm_state_complete(QEMUFile *f, const MigrationParams *params);
 void qemu_savevm_state_cancel(QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 26/35] postcopy/outgoing: add -n options to disable background transfer
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (24 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 25/35] postcopy: implement outgoing part of postcopy live migration Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-11-01 19:56   ` Eric Blake
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault Isaku Yamahata
                   ` (11 subsequent siblings)
  37 siblings, 1 reply; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

This is for benchmark purpose

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hmp-commands.hx      |   10 ++++++----
 hmp.c                |    4 +++-
 migration-postcopy.c |    7 +++++++
 migration.c          |    4 +++-
 migration.h          |    1 +
 qapi-schema.json     |    2 +-
 qmp-commands.hx      |    3 ++-
 savevm.c             |    1 +
 8 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index f2f1264..b054760 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,25 +826,27 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s",
-        .params     = "[-d] [-b] [-i] [-p] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
+        .params     = "[-d] [-b] [-i] [-p [-n]] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
 		      "(base image shared between src and destination)"
-		      "\n\t\t\t-p for migration with postcopy mode enabled",
+		      "\n\t\t\t-p for migration with postcopy mode enabled"
+		      "\n\t\t\t-n for no background transfer of postcopy mode",
         .mhandler.cmd = hmp_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
 	-p for migration with postcopy mode enabled
+	-n for migration with postcopy mode enabled without background transfer
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index 2ea3bc4..203b552 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1036,11 +1036,13 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int blk = qdict_get_try_bool(qdict, "blk", 0);
     int inc = qdict_get_try_bool(qdict, "inc", 0);
     int postcopy = qdict_get_try_bool(qdict, "postcopy", 0);
+    int nobg = qdict_get_try_bool(qdict, "nobg", 0);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
     qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
-                !!postcopy, postcopy, &err);
+                !!postcopy, postcopy, !!nobg, nobg,
+                &err);
     if (err) {
         monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
         error_free(err);
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 399e233..5f98ae6 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -557,6 +557,13 @@ int postcopy_outgoing_ram_save_background(QEMUFile *f, void *postcopy)
         abort();
     }
 
+    if (s->ms->params.nobg) {
+        if (ram_bytes_remaining() == 0) {
+            postcopy_outgoing_ram_all_sent(f, s);
+        }
+        return 0;
+    }
+
     DPRINTF("outgoing background state: %d\n", s->state);
     i = 0;
     t0 = qemu_get_clock_ns(rt_clock);
diff --git a/migration.c b/migration.c
index 85f8f71..279dda5 100644
--- a/migration.c
+++ b/migration.c
@@ -510,7 +510,8 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
-                 bool has_postcopy, bool postcopy, Error **errp)
+                 bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+                 Error **errp)
 {
     MigrationState *s = migrate_get_current();
     MigrationParams params;
@@ -520,6 +521,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.blk = blk;
     params.shared = inc;
     params.postcopy = postcopy;
+    params.nobg = nobg;
 
     if (s->state == MIG_STATE_ACTIVE) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 9b3c03b..6724c19 100644
--- a/migration.h
+++ b/migration.h
@@ -25,6 +25,7 @@ struct MigrationParams {
     bool blk;
     bool shared;
     bool postcopy;
+    bool nobg;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index c969e5a..70d0577 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2095,7 +2095,7 @@
 ##
 { 'command': 'migrate',
   'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
-           '*postcopy': 'bool'} }
+           '*postcopy': 'bool', '*nobg': 'bool'} }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index ece7a7e..defbeba 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -518,7 +518,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
         .mhandler.cmd_new = qmp_marshal_input_migrate,
     },
 
@@ -533,6 +533,7 @@ Arguments:
 - "blk": block migration, full disk copy (json-bool, optional)
 - "inc": incremental disk copy (json-bool, optional)
 - "postcopy": postcopy migration (json-bool, optional)
+- "nobg": postcopy without background transfer (json-bool, optional)
 - "uri": Destination URI (json-string)
 
 Example:
diff --git a/savevm.c b/savevm.c
index 675f9a5..0a3acd8 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1839,6 +1839,7 @@ static int qemu_savevm_state(QEMUFile *f)
         .blk = 0,
         .shared = 0,
         .postcopy = 0,
+        .nobg = 0,
     };
 
     if (qemu_savevm_state_blocked(NULL)) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (25 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 26/35] postcopy/outgoing: add -n options to disable background transfer Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-11-01 20:10   ` Eric Blake
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 28/35] arch_init: factor out setting last_block, last_offset Isaku Yamahata
                   ` (10 subsequent siblings)
  37 siblings, 1 reply; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

When page is requested, send surrounding pages are also sent.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hmp-commands.hx      |   15 ++++++++-----
 hmp.c                |    3 +++
 migration-postcopy.c |   57 +++++++++++++++++++++++++++++++++++++++++++++-----
 migration.c          |   20 ++++++++++++++++++
 migration.h          |    2 ++
 qapi-schema.json     |    3 ++-
 6 files changed, 89 insertions(+), 11 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index b054760..5e2c77c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,26 +826,31 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
-        .params     = "[-d] [-b] [-i] [-p [-n]] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,"
+	              "forward:i?,backward:i?",
+        .params     = "[-d] [-b] [-i] [-p [-n] uri [forward] [backword]",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
 		      "(base image shared between src and destination)"
 		      "\n\t\t\t-p for migration with postcopy mode enabled"
-		      "\n\t\t\t-n for no background transfer of postcopy mode",
+		      "\n\t\t\t-n for no background transfer of postcopy mode"
+		      "\n\t\t\tforward: the number of pages to "
+		      "forward-prefault when postcopy (default 0)"
+		      "\n\t\t\tbackward: the number of pages to "
+		      "backward-prefault when postcopy (default 0)",
         .mhandler.cmd = hmp_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
-	-p for migration with postcopy mode enabled
+	-p for migration with postcopy mode enabled (forward/backward is prefault size when postcopy)
 	-n for migration with postcopy mode enabled without background transfer
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index 203b552..fb1275d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1037,11 +1037,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int inc = qdict_get_try_bool(qdict, "inc", 0);
     int postcopy = qdict_get_try_bool(qdict, "postcopy", 0);
     int nobg = qdict_get_try_bool(qdict, "nobg", 0);
+    int forward = qdict_get_try_int(qdict, "forward", 0);
+    int backward = qdict_get_try_int(qdict, "backward", 0);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
     qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
                 !!postcopy, postcopy, !!nobg, nobg,
+                !!forward, forward, !!backward, backward,
                 &err);
     if (err) {
         monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 5f98ae6..3d51898 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -344,6 +344,37 @@ int postcopy_outgoing_ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
+static void postcopy_outgoing_ram_save_page(PostcopyOutgoingState *s,
+                                            uint64_t pgoffset, bool *written,
+                                            bool forward,
+                                            int prefault_pgoffset)
+{
+    ram_addr_t offset;
+    int ret;
+
+    if (forward) {
+        pgoffset += prefault_pgoffset;
+    } else {
+        if (pgoffset < prefault_pgoffset) {
+            return;
+        }
+        pgoffset -= prefault_pgoffset;
+    }
+
+    offset = pgoffset << TARGET_PAGE_BITS;
+    if (offset >= s->last_block_read->length) {
+        assert(forward);
+        assert(prefault_pgoffset > 0);
+        return;
+    }
+
+    ret = ram_save_page(s->mig_buffered_write, s->last_block_read, offset,
+                        false);
+    if (ret > 0) {
+        *written = true;
+    }
+}
+
 /*
  * return value
  *   0: continue postcopy mode
@@ -355,6 +386,7 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
                                         bool *written)
 {
     int i;
+    uint64_t j;
     RAMBlock *block;
 
     DPRINTF("cmd %d state %d\n", req->cmd, s->state);
@@ -387,11 +419,26 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
             break;
         }
         for (i = 0; i < req->nr; i++) {
-            DPRINTF("offs[%d] 0x%"PRIx64"\n", i, req->pgoffs[i]);
-            int ret = ram_save_page(s->mig_buffered_write, s->last_block_read,
-                                    req->pgoffs[i] << TARGET_PAGE_BITS, false);
-            if (ret > 0) {
-                *written = true;
+            DPRINTF("pgoffs[%d] 0x%"PRIx64"\n", i, req->pgoffs[i]);
+            postcopy_outgoing_ram_save_page(s, req->pgoffs[i], written,
+                                            true, 0);
+        }
+        /* forward prefault */
+        for (j = 1; j <= s->ms->params.prefault_forward; j++) {
+            for (i = 0; i < req->nr; i++) {
+                DPRINTF("pgoffs[%d] + 0x%"PRIx64" 0x%"PRIx64"\n",
+                        i, j, req->pgoffs[i] + j);
+                postcopy_outgoing_ram_save_page(s, req->pgoffs[i], written,
+                                                true, j);
+            }
+        }
+        /* backward prefault */
+        for (j = 1; j <= s->ms->params.prefault_backward; j++) {
+            for (i = 0; i < req->nr; i++) {
+                DPRINTF("pgoffs[%d] - 0x%"PRIx64" 0x%"PRIx64"\n",
+                        i, j, req->pgoffs[i] - j);
+                postcopy_outgoing_ram_save_page(s, req->pgoffs[i], written,
+                                                false, j);
             }
         }
         break;
diff --git a/migration.c b/migration.c
index 279dda5..f29e3bb 100644
--- a/migration.c
+++ b/migration.c
@@ -511,6 +511,8 @@ void migrate_del_blocker(Error *reason)
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
                  bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+                 bool has_forward, int64_t forward,
+                 bool has_backward, int64_t backward,
                  Error **errp)
 {
     MigrationState *s = migrate_get_current();
@@ -522,6 +524,24 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.shared = inc;
     params.postcopy = postcopy;
     params.nobg = nobg;
+    params.prefault_forward = 0;
+    if (has_forward) {
+        if (forward < 0) {
+            error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+                      "forward", "forward >= 0");
+            return;
+        }
+        params.prefault_forward = forward;
+    }
+    params.prefault_backward = 0;
+    if (has_backward) {
+        if (backward < 0) {
+            error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+                      "backward", "backward >= 0");
+            return;
+        }
+        params.prefault_backward = backward;
+    }
 
     if (s->state == MIG_STATE_ACTIVE) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 6724c19..8462251 100644
--- a/migration.h
+++ b/migration.h
@@ -26,6 +26,8 @@ struct MigrationParams {
     bool shared;
     bool postcopy;
     bool nobg;
+    int64_t prefault_forward;
+    int64_t prefault_backward;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index 70d0577..746bf21 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2095,7 +2095,8 @@
 ##
 { 'command': 'migrate',
   'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
-           '*postcopy': 'bool', '*nobg': 'bool'} }
+           '*postcopy': 'bool', '*nobg': 'bool',
+           '*forward': 'int', '*backward': 'int'} }
 
 # @xen-save-devices-state:
 #
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 28/35] arch_init: factor out setting last_block, last_offset
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (26 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 29/35] postcopy/outgoing: add movebg mode(-m) to migration command Isaku Yamahata
                   ` (9 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   10 +++++++---
 arch_init.h |    1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d95ce7b..9137013 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -416,6 +416,12 @@ static void migration_bitmap_sync(void)
 
 static uint64_t bytes_transferred;
 
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset)
+{
+    last_block = block;
+    last_offset = offset;
+}
+
 /*
  * ram_save_page: Writes a page of memory to the stream f
  *
@@ -496,9 +502,7 @@ bool ram_save_block(QEMUFile *f, bool last_stage)
         }
     } while (block != last_block || offset != last_offset);
 
-    last_block = block;
-    last_offset = offset;
-
+    ram_save_set_last_block(block, offset);
     return wrote;
 }
 
diff --git a/arch_init.h b/arch_init.h
index 499d0f1..9165456 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -49,6 +49,7 @@ CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp);
 int ram_load_page(QEMUFile *f, void *host, int flags);
 
 #if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset);
 bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
                    bool last_stage);
 RAMBlock *ram_find_block(const char *id, uint8_t len);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 29/35] postcopy/outgoing: add movebg mode(-m) to migration command
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (27 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 28/35] arch_init: factor out setting last_block, last_offset Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-11-01 20:15   ` Eric Blake
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 30/35] arch_init: factor out ram_load Isaku Yamahata
                   ` (8 subsequent siblings)
  37 siblings, 1 reply; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

When movebg mode is enabled, the point to send background page is set
to the next page to on-demand page.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hmp-commands.hx      |    8 +++++---
 hmp.c                |    3 ++-
 migration-postcopy.c |    8 ++++++++
 migration.c          |    5 ++++-
 migration.h          |    1 +
 qapi-schema.json     |    2 +-
 qmp-commands.hx      |    2 +-
 savevm.c             |    1 +
 8 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 5e2c77c..942f620 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,15 +826,16 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,"
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,"
 	              "forward:i?,backward:i?",
-        .params     = "[-d] [-b] [-i] [-p [-n] uri [forward] [backword]",
+        .params     = "[-d] [-b] [-i] [-p [-n] [-m]] uri [forward] [backword]",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
 		      "(base image shared between src and destination)"
 		      "\n\t\t\t-p for migration with postcopy mode enabled"
+		      "\n\t\t\t-m for move background transfer of postcopy mode"
 		      "\n\t\t\t-n for no background transfer of postcopy mode"
 		      "\n\t\t\tforward: the number of pages to "
 		      "forward-prefault when postcopy (default 0)"
@@ -845,12 +846,13 @@ ETEXI
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward}
+@item migrate [-d] [-b] [-i] [-p [-n] [-m]] @var{uri} @var{forward} @var{backward}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
 	-p for migration with postcopy mode enabled (forward/backward is prefault size when postcopy)
+	-m for migratoin with postcopy mode enabled with moving position
 	-n for migration with postcopy mode enabled without background transfer
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index fb1275d..a0bd869 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1036,6 +1036,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int blk = qdict_get_try_bool(qdict, "blk", 0);
     int inc = qdict_get_try_bool(qdict, "inc", 0);
     int postcopy = qdict_get_try_bool(qdict, "postcopy", 0);
+    int movebg = qdict_get_try_bool(qdict, "movebg", 0);
     int nobg = qdict_get_try_bool(qdict, "nobg", 0);
     int forward = qdict_get_try_int(qdict, "forward", 0);
     int backward = qdict_get_try_int(qdict, "backward", 0);
@@ -1043,7 +1044,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     Error *err = NULL;
 
     qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
-                !!postcopy, postcopy, !!nobg, nobg,
+                !!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg,
                 !!forward, forward, !!backward, backward,
                 &err);
     if (err) {
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 3d51898..421fb39 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -432,6 +432,14 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
                                                 true, j);
             }
         }
+        if (s->ms->params.movebg) {
+            ram_addr_t last_offset =
+                (req->pgoffs[req->nr - 1] + s->ms->params.prefault_forward) <<
+                TARGET_PAGE_BITS;
+            last_offset = MIN(last_offset,
+                              s->last_block_read->length - TARGET_PAGE_SIZE);
+            ram_save_set_last_block(s->last_block_read, last_offset);
+        }
         /* backward prefault */
         for (j = 1; j <= s->ms->params.prefault_backward; j++) {
             for (i = 0; i < req->nr; i++) {
diff --git a/migration.c b/migration.c
index f29e3bb..057ea31 100644
--- a/migration.c
+++ b/migration.c
@@ -510,7 +510,9 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
-                 bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+                 bool has_postcopy, bool postcopy,
+                 bool has_movebg, bool movebg,
+                 bool has_nobg, bool nobg,
                  bool has_forward, int64_t forward,
                  bool has_backward, int64_t backward,
                  Error **errp)
@@ -524,6 +526,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.shared = inc;
     params.postcopy = postcopy;
     params.nobg = nobg;
+    params.movebg = movebg;
     params.prefault_forward = 0;
     if (has_forward) {
         if (forward < 0) {
diff --git a/migration.h b/migration.h
index 8462251..6cc3682 100644
--- a/migration.h
+++ b/migration.h
@@ -26,6 +26,7 @@ struct MigrationParams {
     bool shared;
     bool postcopy;
     bool nobg;
+    bool movebg;
     int64_t prefault_forward;
     int64_t prefault_backward;
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index 746bf21..cf5d988 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2095,7 +2095,7 @@
 ##
 { 'command': 'migrate',
   'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
-           '*postcopy': 'bool', '*nobg': 'bool',
+           '*postcopy': 'bool', '*movebg': 'bool', '*nobg': 'bool',
            '*forward': 'int', '*backward': 'int'} }
 
 # @xen-save-devices-state:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index defbeba..7028ece 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -518,7 +518,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s",
         .mhandler.cmd_new = qmp_marshal_input_migrate,
     },
 
diff --git a/savevm.c b/savevm.c
index 0a3acd8..8d26354 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1839,6 +1839,7 @@ static int qemu_savevm_state(QEMUFile *f)
         .blk = 0,
         .shared = 0,
         .postcopy = 0,
+        .movebg = 0,
         .nobg = 0,
     };
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 30/35] arch_init: factor out ram_load
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (28 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 29/35] postcopy/outgoing: add movebg mode(-m) to migration command Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 31/35] arch_init: export ram_save_iterate() Isaku Yamahata
                   ` (7 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   13 ++++++++++---
 arch_init.h |    3 +++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9137013..f86a0b4 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -867,7 +867,9 @@ int ram_load_page(QEMUFile *f, void *host, int flags)
     return 0;
 }
 
-static int ram_load(QEMUFile *f, void *opaque, int version_id)
+int ram_load(QEMUFile *f, void *opaque, int version_id,
+             void *(host_from_stream_offset_p)(QEMUFile *f,
+                                               ram_addr_t offsset, int flags))
 {
     ram_addr_t addr;
     int flags, ret = 0;
@@ -899,7 +901,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 
         if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_XBZRLE)) {
-            host = host_from_stream_offset(f, addr, flags);
+            host = host_from_stream_offset_p(f, addr, flags);
             if (!host) {
                 return -EINVAL;
             }
@@ -922,6 +924,11 @@ done:
     return ret;
 }
 
+static int ram_load_precopy(QEMUFile *f, void *opaque, int version_id)
+{
+    return ram_load(f, opaque, version_id, &host_from_stream_offset);
+}
+
 static void ram_save_set_params(const MigrationParams *params, void *opaque)
 {
     if (params->postcopy) {
@@ -940,7 +947,7 @@ SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
     .save_live_complete = ram_save_complete,
-    .load_state = ram_load,
+    .load_state = ram_load_precopy,
     .cancel = ram_migration_cancel,
 };
 
diff --git a/arch_init.h b/arch_init.h
index 9165456..3977ca7 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -54,6 +54,9 @@ bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
                    bool last_stage);
 RAMBlock *ram_find_block(const char *id, uint8_t len);
 int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes);
+int ram_load(QEMUFile *f, void *opaque, int version_id,
+             void *(host_from_stream_offset_p)(QEMUFile *f,
+                                               ram_addr_t offsset, int flags));
 #endif
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 31/35] arch_init: export ram_save_iterate()
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (29 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 30/35] arch_init: factor out ram_load Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 32/35] postcopy: pre+post optimization incoming side Isaku Yamahata
                   ` (6 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   11 ++++++++---
 arch_init.h |    1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index f86a0b4..48f45cd 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -633,7 +633,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static int ram_save_iterate(QEMUFile *f, void *opaque)
+int ram_save_iterate(QEMUFile *f)
 {
     uint64_t bytes_transferred_last;
     double bwidth = 0;
@@ -705,6 +705,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     return 0;
 }
 
+static int ram_save_iterate_bwidth(QEMUFile *f, void *opaque)
+{
+    return ram_save_iterate(f);
+}
+
 static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     migration_bitmap_sync();
@@ -937,7 +942,7 @@ static void ram_save_set_params(const MigrationParams *params, void *opaque)
         savevm_ram_handlers.save_live_complete =
             postcopy_outgoing_ram_save_complete;
     } else {
-        savevm_ram_handlers.save_live_iterate = ram_save_iterate;
+        savevm_ram_handlers.save_live_iterate = ram_save_iterate_bwidth;
         savevm_ram_handlers.save_live_complete = ram_save_complete;
     }
 }
@@ -945,7 +950,7 @@ static void ram_save_set_params(const MigrationParams *params, void *opaque)
 SaveVMHandlers savevm_ram_handlers = {
     .set_params = ram_save_set_params,
     .save_live_setup = ram_save_setup,
-    .save_live_iterate = ram_save_iterate,
+    .save_live_iterate = ram_save_iterate_bwidth,
     .save_live_complete = ram_save_complete,
     .load_state = ram_load_precopy,
     .cancel = ram_migration_cancel,
diff --git a/arch_init.h b/arch_init.h
index 3977ca7..966b25a 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -47,6 +47,7 @@ CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp);
 #define RAM_SAVE_VERSION_ID     4 /* currently version 4 */
 
 int ram_load_page(QEMUFile *f, void *host, int flags);
+int ram_save_iterate(QEMUFile *f);
 
 #if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
 void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 32/35] postcopy: pre+post optimization incoming side
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (30 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 31/35] arch_init: export ram_save_iterate() Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 33/35] arch_init: export migration_bitmap_sync and helper method to get bitmap Isaku Yamahata
                   ` (5 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration-postcopy.c |  207 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 204 insertions(+), 3 deletions(-)

diff --git a/migration-postcopy.c b/migration-postcopy.c
index 421fb39..9298cd4 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -274,6 +274,9 @@ static void postcopy_outgoing_free_req(struct qemu_umem_req *req)
 #define QEMU_VM_POSTCOPY_INIT           0
 #define QEMU_VM_POSTCOPY_SECTION_FULL   1
 
+/* options in QEMU_VM_POSTCOPY_INIT section */
+#define POSTCOPY_OPTION_PRECOPY         1ULL
+
 /***************************************************************************
  * outgoing part
  */
@@ -739,6 +742,7 @@ struct PostcopyIncomingUMemDaemon {
     int nr_target_pages_per_host_page;
     int target_to_host_page_shift;
     int version_id;     /* save/load format version id */
+    bool precopy_enabled;
 
     QemuThread thread;
     QLIST_HEAD(, UMemBlock) blocks;
@@ -784,6 +788,7 @@ static PostcopyIncomingState state = {
 
 static PostcopyIncomingUMemDaemon umemd = {
     .state = 0,
+    .precopy_enabled = false,
     .to_qemu_fd = -1,
     .to_qemu = NULL,
     .from_qemu_fd = -1,
@@ -797,6 +802,8 @@ static PostcopyIncomingUMemDaemon umemd = {
 
 static void *postcopy_incoming_umemd(void*);
 static void postcopy_incoming_qemu_handle_req(void *opaque);
+static UMemBlock *postcopy_incoming_umem_block_from_stream(
+    QEMUFile *f, int flags);
 
 /* protected by qemu_mutex_lock_ramlist() */
 void postcopy_incoming_ram_free(RAMBlock *ram_block)
@@ -875,6 +882,25 @@ int postcopy_incoming_ram_load(QEMUFile *f, void *opaque, int version_id)
     return -EINVAL;
 }
 
+static void*
+postcopy_incoming_shmem_from_stream_offset(QEMUFile *f, ram_addr_t offset,
+                                           int flags)
+{
+    UMemBlock *block = postcopy_incoming_umem_block_from_stream(f, flags);
+    if (block == NULL) {
+        DPRINTF("error block = NULL\n");
+        return NULL;
+    }
+    return block->umem->shmem + offset;
+}
+
+static int postcopy_incoming_ram_load_precopy(QEMUFile *f, void *opaque,
+                                              int version_id)
+{
+    return ram_load(f, opaque, version_id,
+                    &postcopy_incoming_shmem_from_stream_offset);
+}
+
 static void postcopy_incoming_umem_block_free(void)
 {
     UMemBlock *block;
@@ -982,6 +1008,12 @@ static int postcopy_incoming_loadvm_init(QEMUFile *f, uint32_t size)
         return -EINVAL;
     }
     options = qemu_get_be64(f);
+    if (options & POSTCOPY_OPTION_PRECOPY) {
+        options &= ~POSTCOPY_OPTION_PRECOPY;
+        umemd.precopy_enabled = true;
+    } else {
+        umemd.precopy_enabled = false;
+    }
     if (options) {
         fprintf(stderr, "unknown options 0x%"PRIx64, options);
         return -ENOSYS;
@@ -999,12 +1031,17 @@ static int postcopy_incoming_loadvm_init(QEMUFile *f, uint32_t size)
         return -ENOSYS;
     }
 
-    DPRINTF("detected POSTCOPY\n");
+    DPRINTF("detected POSTCOPY precpoy %d\n", umemd.precopy_enabled);
     error = postcopy_incoming_prepare();
     if (error) {
         return error;
     }
-    savevm_ram_handlers.load_state = postcopy_incoming_ram_load;
+    if (umemd.precopy_enabled) {
+        savevm_ram_handlers.load_state = postcopy_incoming_ram_load_precopy;
+    } else {
+        savevm_ram_handlers.load_state = postcopy_incoming_ram_load;
+    }
+
     incoming_postcopy = true;
     return 0;
 }
@@ -1515,6 +1552,169 @@ static int postcopy_incoming_umem_ram_load(void)
     return 0;
 }
 
+static int postcopy_incoming_umemd_read_dirty_bitmap(
+    QEMUFile *f, const char *idstr, uint8_t idlen,
+    uint64_t block_offset, uint64_t block_length, uint64_t bitmap_length)
+{
+    UMemBlock *block;
+    uint64_t bit_start = block_offset >> TARGET_PAGE_BITS;
+    uint64_t bit_end = (block_offset + block_length) >> TARGET_PAGE_BITS;
+    uint64_t bit_offset;
+    uint8_t *buffer;
+    uint64_t index;
+
+    if ((bitmap_length % sizeof(uint64_t)) != 0) {
+        return -EINVAL;
+    }
+    QLIST_FOREACH(block, &umemd.blocks, next) {
+        if (!strncmp(block->idstr, idstr, idlen)) {
+            break;
+        }
+    }
+    if (block == NULL) {
+        return -EINVAL;
+    }
+
+    DPRINTF("bitmap %s 0x%"PRIx64" 0x%"PRIx64" 0x%"PRIx64"\n",
+            block->idstr, block_offset, block_length, bitmap_length);
+    buffer = g_malloc(bitmap_length);
+    qemu_get_buffer(f, buffer, bitmap_length);
+
+    bit_offset = bit_start & ~63;
+    index = 0;
+    while (index < bitmap_length) {
+        uint64_t bitmap;
+        int i;
+        int j;
+        int bit;
+
+        bitmap = be64_to_cpup((uint64_t*)(buffer + index));
+        for (i = 0; i < 64; i++) {
+            bit = bit_offset + i;
+            if (bit < bit_start) {
+                continue;
+            }
+            if (bit >= bit_end) {
+                break;
+            }
+            if (!(bitmap & (1ULL << i))) {
+                set_bit(bit, umemd.phys_received);
+
+                /* this is racy, but write side just sends redundant request */
+                set_bit(bit, umemd.phys_requested);
+            }
+        }
+
+        umemd.page_cached->nr = 0;
+        if (TARGET_PAGE_SIZE >= umemd.host_page_size) {
+            for (i = 0; i < 64; i++) {
+                uint64_t pgoff;
+                bit = bit_offset + i;
+                if (bit < bit_start) {
+                    continue;
+                }
+                if (bit >= bit_end) {
+                    break;
+                }
+                if (!test_bit(bit, umemd.phys_received)) {
+                    continue;
+                }
+                pgoff = (bit - bit_start) << umemd.target_to_host_page_shift;
+                for (j = 0; j < umemd.nr_host_pages_per_target_page; j++) {
+                    umemd.page_cached->pgoffs[umemd.page_cached->nr] =
+                        pgoff + j;
+                    umemd.page_cached->nr++;
+                }
+            }
+        } else {
+            for (i = 0; i < 64; i += umemd.nr_target_pages_per_host_page) {
+                bool mark_cache = true;
+                bit = bit_offset + i;
+                if (bit < bit_start) {
+                    continue;
+                }
+                if (bit >= bit_end) {
+                    break;
+                }
+                if (!test_bit(bit, umemd.phys_received)) {
+                    continue;
+                }
+                for (j = 0; j < umemd.nr_target_pages_per_host_page; j++) {
+                    if (!test_bit(bit + j, umemd.phys_received)) {
+                        mark_cache = false;
+                        break;
+                    }
+                }
+                if (mark_cache) {
+                    umemd.page_cached->pgoffs[umemd.page_cached->nr] =
+                        (bit - bit_start) >>
+                        (umemd.host_page_shift - TARGET_PAGE_BITS);
+                    umemd.page_cached->nr++;
+                }
+            }
+        }
+
+        if (umemd.page_cached->nr > 0) {
+            umem_mark_page_cached(block->umem, umemd.page_cached);
+            postcopy_incoming_umem_page_fault(block, umemd.page_cached);
+        }
+
+        bit_offset += 64;
+        index += sizeof(bitmap);
+    }
+
+    g_free(buffer);
+    return 0;
+}
+
+static int postcopy_incoming_umemd_mig_read_init(void)
+{
+    QEMUFile *f = umemd.mig_read;
+#ifdef DEBUG_POSTCOPY
+    uint64_t start = qemu_get_clock_ns(rt_clock);
+    uint64_t end;
+#endif
+
+    if (!umemd.precopy_enabled) {
+        return 0;
+    }
+
+    for (;;) {
+        uint8_t idlen;
+        char idstr[256];
+        uint64_t block_offset;
+        uint64_t block_length;
+        uint64_t bitmap_length;
+        int ret;
+
+        idlen = qemu_get_byte(f);
+        qemu_get_buffer(f, (uint8_t*)idstr, idlen);
+        idstr[idlen] = 0;
+        block_offset = qemu_get_be64(f);
+        block_length = qemu_get_be64(f);
+        bitmap_length = qemu_get_be64(f);
+
+        if (idlen == 0 && block_offset == 0 && block_length == 0 &&
+            bitmap_length == 0) {
+            DPRINTF("bitmap done\n");
+            break;
+        }
+        ret = postcopy_incoming_umemd_read_dirty_bitmap(
+            f, idstr, idlen, block_offset, block_length, bitmap_length);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+    if (postcopy_incoming_umem_check_umem_done()) {
+        postcopy_incoming_umem_done();
+    }
+#ifdef DEBUG_POSTCOPY
+    end = qemu_get_clock_ns(rt_clock);
+    DPRINTF("bitmap %"PRIu64" nsec\n", end - start);
+#endif
+    return 0;
+}
+
 static int postcopy_incoming_umemd_mig_read_loop(void)
 {
     int error;
@@ -1704,7 +1904,8 @@ static void *postcopy_incoming_umemd(void* unused)
     qemu_thread_create(&umemd.mig_read_thread,
                        &postcopy_incoming_umemd_thread,
                        &(IncomingThread) {
-                           NULL, &postcopy_incoming_umemd_mig_read_loop,},
+                           &postcopy_incoming_umemd_mig_read_init,
+                           &postcopy_incoming_umemd_mig_read_loop,},
                        QEMU_THREAD_JOINABLE);
     qemu_thread_create(&umemd.mig_write_thread,
                        &postcopy_incoming_umemd_thread,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 33/35] arch_init: export migration_bitmap_sync and helper method to get bitmap
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (31 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 32/35] postcopy: pre+post optimization incoming side Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 34/35] postcopy/outgoing: introduce precopy_count parameter Isaku Yamahata
                   ` (4 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Those migration bitmap operation will be used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |    7 ++++++-
 migration.h |    2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 48f45cd..49fbaff 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -345,6 +345,11 @@ void migration_bitmap_free(void)
     migration_bitmap = NULL;
 }
 
+const unsigned long *migration_bitmap_get(void)
+{
+    return migration_bitmap;
+}
+
 static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr,
                                                          ram_addr_t offset)
 {
@@ -373,7 +378,7 @@ static inline bool migration_bitmap_set_dirty(MemoryRegion *mr,
     return ret;
 }
 
-static void migration_bitmap_sync(void)
+void migration_bitmap_sync(void)
 {
     RAMBlock *block;
     ram_addr_t addr;
diff --git a/migration.h b/migration.h
index 6cc3682..2801e7e 100644
--- a/migration.h
+++ b/migration.h
@@ -111,6 +111,8 @@ uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 void migration_bitmap_init(void);
 void migration_bitmap_free(void);
+const unsigned long *migration_bitmap_get(void);
+void migration_bitmap_sync(void);
 
 extern SaveVMHandlers savevm_ram_handlers;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 34/35] postcopy/outgoing: introduce precopy_count parameter
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (32 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 33/35] arch_init: export migration_bitmap_sync and helper method to get bitmap Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-11-01 21:20   ` Eric Blake
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 35/35] postcopy: pre+post optimization outgoing side Isaku Yamahata
                   ` (3 subsequent siblings)
  37 siblings, 1 reply; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Precopy with this loop number before postcopy mode.
This will be implemented by the next patch.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hmp-commands.hx      |   10 ++++++----
 hmp.c                |    2 ++
 migration-postcopy.c |    2 +-
 migration.c          |    2 ++
 migration.h          |    3 ++-
 qapi-schema.json     |    4 +++-
 qmp-commands.hx      |    2 +-
 savevm.c             |    3 ++-
 8 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 942f620..957bf76 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,9 +826,10 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,"
-	              "forward:i?,backward:i?",
-        .params     = "[-d] [-b] [-i] [-p [-n] [-m]] uri [forward] [backword]",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,"
+	              "uri:s,precopy_count:i?,forward:i?,backward:i?",
+        .params     = "[-d] [-b] [-i] [-p [-n] [-m]] uri "
+	              "[precopy_count] [forward] [backword]",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
@@ -837,6 +838,7 @@ ETEXI
 		      "\n\t\t\t-p for migration with postcopy mode enabled"
 		      "\n\t\t\t-m for move background transfer of postcopy mode"
 		      "\n\t\t\t-n for no background transfer of postcopy mode"
+		      "\n\t\t\tprecopy_count: loop of precopy when postcopy"
 		      "\n\t\t\tforward: the number of pages to "
 		      "forward-prefault when postcopy (default 0)"
 		      "\n\t\t\tbackward: the number of pages to "
@@ -846,7 +848,7 @@ ETEXI
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p [-n] [-m]] @var{uri} @var{forward} @var{backward}
+@item migrate [-d] [-b] [-i] [-p [-n] [-m]] @var{uri} @var{precopy_count} @var{forward} @var{backward}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
diff --git a/hmp.c b/hmp.c
index a0bd869..be88db9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1038,6 +1038,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int postcopy = qdict_get_try_bool(qdict, "postcopy", 0);
     int movebg = qdict_get_try_bool(qdict, "movebg", 0);
     int nobg = qdict_get_try_bool(qdict, "nobg", 0);
+    int precopy_count = qdict_get_try_int(qdict, "precopy_count", 0);
     int forward = qdict_get_try_int(qdict, "forward", 0);
     int backward = qdict_get_try_int(qdict, "backward", 0);
     const char *uri = qdict_get_str(qdict, "uri");
@@ -1045,6 +1046,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 
     qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
                 !!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg,
+                !!precopy_count, precopy_count,
                 !!forward, forward, !!backward, backward,
                 &err);
     if (err) {
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 9298cd4..8a43c42 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -319,7 +319,7 @@ int postcopy_outgoing_create_read_socket(MigrationState *s)
     return 0;
 }
 
-void postcopy_outgoing_state_begin(QEMUFile *f)
+void postcopy_outgoing_state_begin(QEMUFile *f, const MigrationParams *params)
 {
     uint64_t options = 0;
     qemu_put_ubyte(f, QEMU_VM_POSTCOPY_INIT);
diff --git a/migration.c b/migration.c
index 057ea31..84ca4b3 100644
--- a/migration.c
+++ b/migration.c
@@ -513,6 +513,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_postcopy, bool postcopy,
                  bool has_movebg, bool movebg,
                  bool has_nobg, bool nobg,
+                 bool has_precopy_count, int64_t precopy_count,
                  bool has_forward, int64_t forward,
                  bool has_backward, int64_t backward,
                  Error **errp)
@@ -527,6 +528,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.postcopy = postcopy;
     params.nobg = nobg;
     params.movebg = movebg;
+    params.precopy_count = precopy_count,
     params.prefault_forward = 0;
     if (has_forward) {
         if (forward < 0) {
diff --git a/migration.h b/migration.h
index 2801e7e..c4d7b0a 100644
--- a/migration.h
+++ b/migration.h
@@ -27,6 +27,7 @@ struct MigrationParams {
     bool postcopy;
     bool nobg;
     bool movebg;
+    int precopy_count;
     int64_t prefault_forward;
     int64_t prefault_backward;
 };
@@ -150,7 +151,7 @@ int64_t xbzrle_cache_resize(int64_t new_size);
 
 /* For outgoing postcopy */
 int postcopy_outgoing_create_read_socket(MigrationState *s);
-void postcopy_outgoing_state_begin(QEMUFile *f);
+void postcopy_outgoing_state_begin(QEMUFile *f, const MigrationParams *params);
 void postcopy_outgoing_state_complete(
     QEMUFile *f, const uint8_t *buffer, size_t buffer_size);
 int postcopy_outgoing_ram_save_iterate(QEMUFile *f, void *opaque);
diff --git a/qapi-schema.json b/qapi-schema.json
index cf5d988..5c4d2f2 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2089,6 +2089,8 @@
 # @detach: this argument exists only for compatibility reasons and
 #          is ignored by QEMU
 #
+# @precopy_count: #optional the number of loops of precopy when postcopy
+#
 # Returns: nothing on success
 #
 # Since: 0.14.0
@@ -2096,7 +2098,7 @@
 { 'command': 'migrate',
   'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
            '*postcopy': 'bool', '*movebg': 'bool', '*nobg': 'bool',
-           '*forward': 'int', '*backward': 'int'} }
+           '*precopy_count': 'int', '*forward': 'int', '*backward': 'int'} }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7028ece..b8cf174 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -518,7 +518,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,precopy_cout:i?",
         .mhandler.cmd_new = qmp_marshal_input_migrate,
     },
 
diff --git a/savevm.c b/savevm.c
index 8d26354..32781e1 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1650,7 +1650,7 @@ int qemu_savevm_state_begin(QEMUFile *f,
     if (params->postcopy) {
         /* tell this is postcopy */
         qemu_put_byte(f, QEMU_VM_POSTCOPY);
-        postcopy_outgoing_state_begin(f);
+        postcopy_outgoing_state_begin(f, params);
     }
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
@@ -1839,6 +1839,7 @@ static int qemu_savevm_state(QEMUFile *f)
         .blk = 0,
         .shared = 0,
         .postcopy = 0,
+        .precopy_count = 0,
         .movebg = 0,
         .nobg = 0,
     };
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v3 35/35] postcopy: pre+post optimization outgoing side
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (33 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 34/35] postcopy/outgoing: introduce precopy_count parameter Isaku Yamahata
@ 2012-10-30  8:33 ` Isaku Yamahata
  2012-10-30 18:53 ` [Qemu-devel] [PATCH v3 00/35] postcopy live migration Benoit Hudzia
                   ` (2 subsequent siblings)
  37 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-30  8:33 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini, chegu_vinod

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c          |    6 ++--
 migration-postcopy.c |   94 +++++++++++++++++++++++++++++++++++++++++++++++---
 migration.h          |    1 +
 3 files changed, 94 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 49fbaff..f9bd483 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -502,8 +502,10 @@ bool ram_save_block(QEMUFile *f, bool last_stage)
         if (offset >= block->length) {
             offset = 0;
             block = QLIST_NEXT(block, next);
-            if (!block)
+            if (!block) {
                 block = QLIST_FIRST(&ram_list.blocks);
+                migrate_get_current()->precopy_count++;
+            }
         }
     } while (block != last_block || offset != last_offset);
 
@@ -619,7 +621,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
         acct_clear();
     }
 
-    if (!params->postcopy) {
+    if (!(params->postcopy && params->precopy_count == 0)) {
         memory_global_dirty_log_start();
         migration_bitmap_sync();
     }
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 8a43c42..3f63385 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -322,6 +322,10 @@ int postcopy_outgoing_create_read_socket(MigrationState *s)
 void postcopy_outgoing_state_begin(QEMUFile *f, const MigrationParams *params)
 {
     uint64_t options = 0;
+    if (params->precopy_count > 0) {
+        options |= POSTCOPY_OPTION_PRECOPY;
+    }
+
     qemu_put_ubyte(f, QEMU_VM_POSTCOPY_INIT);
     qemu_put_be32(f, sizeof(options));
     qemu_put_be64(f, options);
@@ -337,12 +341,36 @@ void postcopy_outgoing_state_complete(
 
 int postcopy_outgoing_ram_save_iterate(QEMUFile *f, void *opaque)
 {
-    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-    return 1;
+    int ret;
+    MigrationState *s = migrate_get_current();
+    if (s->params.precopy_count == 0) {
+        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+        return 1;
+    }
+
+    ret = ram_save_iterate(f);
+    if (ret < 0) {
+        return ret;
+    }
+    if (ret == 1) {
+        DPRINTF("precopy worked\n");
+        return ret;
+    }
+    if (ram_bytes_remaining() == 0) {
+        DPRINTF("no more precopy\n");
+        return 1;
+    }
+    return s->precopy_count >= s->params.precopy_count? 1: 0;
 }
 
 int postcopy_outgoing_ram_save_complete(QEMUFile *f, void *opaque)
 {
+    MigrationState *s = migrate_get_current();
+    if (s->params.precopy_count > 0) {
+        /* Make sure all dirty bits are set */
+        migration_bitmap_sync();
+        memory_global_dirty_log_stop();
+    }
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     return 0;
 }
@@ -544,6 +572,7 @@ static void postcopy_outgoing_recv_handler(void *opaque)
 PostcopyOutgoingState *postcopy_outgoing_begin(MigrationState *ms)
 {
     PostcopyOutgoingState *s = g_new(PostcopyOutgoingState, 1);
+    const RAMBlock *block;
     DPRINTF("outgoing begin\n");
     qemu_buffered_file_drain(ms->file);
 
@@ -553,9 +582,64 @@ PostcopyOutgoingState *postcopy_outgoing_begin(MigrationState *ms)
     s->mig_read = ms->file_read;
     s->mig_buffered_write = ms->file;
 
-    /* Make sure all dirty bits are set */
-    memory_global_dirty_log_stop();
-    migration_bitmap_init();
+    if (ms->params.precopy_count > 0) {
+        QEMUFile *f = ms->file;
+        uint64_t last_long =
+            BITS_TO_LONGS(last_ram_offset() >> TARGET_PAGE_BITS);
+
+        /* send dirty bitmap */
+        qemu_mutex_lock_ramlist();
+        QLIST_FOREACH(block, &ram_list.blocks, next) {
+            const unsigned long *bitmap = migration_bitmap_get();
+            uint64_t length;
+            uint64_t start;
+            uint64_t end;
+            uint64_t i;
+
+            qemu_put_byte(f, strlen(block->idstr));
+            qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
+            qemu_put_be64(f, block->offset);
+            qemu_put_be64(f, block->length);
+
+            start = (block->offset >> TARGET_PAGE_BITS);
+            end = (block->offset + block->length) >> TARGET_PAGE_BITS;
+
+            length = BITS_TO_LONGS(end - (start & ~63)) * sizeof(unsigned long);
+            length = DIV_ROUND_UP(length, sizeof(uint64_t)) * sizeof(uint64_t);
+            qemu_put_be64(f, length);
+            DPRINTF("dirty bitmap %s 0x%"PRIx64" 0x%"PRIx64" 0x%"PRIx64"\n",
+                    block->idstr, block->offset, block->length, length);
+
+            start /= BITS_PER_LONG;
+            end = DIV_ROUND_UP(end, BITS_PER_LONG);
+            assert(end <= last_long);
+
+            for (i = start; i < end;
+                 i += sizeof(uint64_t) / sizeof(unsigned long)) {
+                uint64_t val;
+#if HOST_LONG_BITS == 64
+                val = bitmap[i];
+#elif HOST_LONG_BITS == 32
+                if (i + 1 < last_long) {
+                    val = bitmap[i] | ((uint64_t)bitmap[i + 1] << 32);
+                } else {
+                    val = bitmap[i];
+                }
+#else
+# error "unsupported"
+#endif
+                qemu_put_be64(f, val);
+            }
+        }
+        qemu_mutex_unlock_ramlist();
+
+        /* terminator */
+        qemu_put_byte(f, 0);    /* idstr len */
+        qemu_put_be64(f, 0);    /* block offset */
+        qemu_put_be64(f, 0);    /* block length */
+        qemu_put_be64(f, 0);    /* bitmap len */
+        DPRINTF("sent dirty bitmap\n");
+    }
 
     qemu_set_fd_handler(s->fd_read,
                         &postcopy_outgoing_recv_handler, NULL, s);
diff --git a/migration.h b/migration.h
index c4d7b0a..9bd4062 100644
--- a/migration.h
+++ b/migration.h
@@ -58,6 +58,7 @@ struct MigrationState
     int fd_read;
     QEMUFile *file_read;        /* connection from the detination */
     PostcopyOutgoingState *postcopy;
+    int precopy_count;
 };
 
 void process_incoming_migration(QEMUFile *f);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (34 preceding siblings ...)
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 35/35] postcopy: pre+post optimization outgoing side Isaku Yamahata
@ 2012-10-30 18:53 ` Benoit Hudzia
  2012-10-31  3:25   ` Isaku Yamahata
  2012-10-30 18:55 ` Benoit Hudzia
  2012-11-06 11:04 ` Orit Wasserman
  37 siblings, 1 reply; 47+ messages in thread
From: Benoit Hudzia @ 2012-10-30 18:53 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: aarcange, aliguori, kvm, quintela, stefanha, t.hirofuchi, dlaor,
	satoshi.itoh, qemu-devel, Petter Svärd, mdroth,
	yoshikawa.takuya, owasserm, Hudzia, Benoit, avi, steve.walsh,
	pbonzini, chegu_vinod

[-- Attachment #1: Type: text/plain, Size: 12706 bytes --]

Hi Isaku,


Are you going to be at the KVM forum ( i think you have a presentation
there). It would be nice if we could meet in order to see if we can synch
our efforts .

As you know we have been developing an RDMA based solution for post copy
migration and  we demonstrated the initial proof of concept in december
2012 ( we published some finding  in VHPC 2012 and are working with Petter
Svard from Umea on a journal paper with more detailed performance review) .

While  RDMA post copy live migration is just of by product of our long term
effort ( i will present the project  in my talk at KVM forum)  we grabbed
the opportunity  to address problems we were facing with the live migration
of enterprise workload . Namely how to migrate in memory database such has
HANA under load.

We quickly discovered that pre copy ( even with optimization ) didn't work
with such workload. We also tried your code however the performance where
far from satisfying with large VM under load due to the heavy cost of
transferring memory between user space - kernel multiple time ( actually it
often failed)

We then tested a   pure RDMA solution we developed  ( we suport HW and
software RDMA )   and it work fine with all the  workload we tested  ( we
migrated VM with 20+ GB running SAP HANA under a workload similar to TPC-H)
and we hop to test with bigger configuration soon ( 1/2 + TB of memory) .

However the state of integration of our code with the QEMU -code base is
not as advanced and polished as the one you currently have and i would like
to know if you would be interested in trying to join our effort or
collaborate in merging our solution. Or maybe allowing us to piggy back on
your effort.

Would you bee free to meet at any time next week ? ( from Tuesday to
Friday)

Ps: we would be open sourcing our project by the end of the month of
November and the post copy is only a small part of the technology
developed.

.


Regards
Benoit


On 30 October 2012 08:32, Isaku Yamahata <yamahata@valinux.co.jp> wrote:

> This is the v3 patch series of postcopy migration.
>
> The trees is available at
> git://github.com/yamahata/qemu.git qemu-postcopy-oct-30-2012
> git://github.com/yamahata/linux-umem.git linux-umem-oct-29-2012
>
> Major changes v2 -> v3:
> - implemented pre+post optimization
> - auto detection of postcopy by incoming side
> - using threads on destination instead of fork
> - using blocking io instead of select + non-blocking io loop
> - less memory overhead
> - various improvement and code simplification
> - kernel module name change umem -> uvmem to avoid name conflict.
>
> Patches organization:
> 1-2: trivial fixes
> 3-5: prepartion for threading. cherry-picked from migration tree
> 6-18: refactoring existing code and preparation
> 19-25: implement postcopy live migration itself (essential part)
> 26-35: optimization/heuristic for postcopy
>
> Usage
> =====
> You need load uvmem character device on the host before starting migration.
> Postcopy can be used for tcg and kvm accelarator. The implementation depend
> on only linux uvmem character device. But the driver dependent code is
> split
> into a file.
> I tested only host page size == guest page size case, but the
> implementation
> allows host page size != guest page size case.
>
> The following options are added with this patch series.
> - incoming part
>   use -incoming as usual. Postcopy is automatically detected.
>   example:
>   qemu -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>
> - outging part
>   options for migrate command
>   migrate [-p [-n] [-m]] URI
>           [<precopy count> [<prefault forward> [<prefault backword>]]]
>
>   Newly added options/arguments
>   -p: indicate postcopy migration
>   -n: disable background transferring pages: This is for
> benchmark/debugging
>   -m: move background transfer of postcopy mode
>   <precopy count>: The number of precopy RAM scan before postcopy.
>                    default 0 (0 means no precopy)
>   <prefault forward>: The number of forward pages which is sent with
> on-demand
>   <prefault backward>: The number of backward pages which is sent with
>                        on-demand
>
>   example:
>   migrate -p -n tcp:<dest ip address>:4444
>   migrate -p -n -m tcp:<dest ip address>:4444 42 42 0
>
>
> TODO
> ====
> - benchmark/evaluation
> - improve/optimization
>   At the moment at least what I'm aware of is
>   - pre+post case
>     On desitnation side reading dirty bitmap would cause long latency.
>     create thread for that.
> - consider on FUSE/CUSE possibility
>
> basic postcopy work flow
> ========================
>         qemu on the destination
>               |
>               V
>         open(/dev/uvmem)
>               |
>               V
>         UVMEM_INIT
>               |
>               V
>         Here we have two file descriptors to
>         umem device and shmem file
>               |
>               |                                  umem threads
>               |                                  on the destination
>               |
>               V    create pipe to communicate
>         crete threads--------------------------------,
>               |                                      |
>               V                                   mmap(shmem file)
>         mmap(uvmem device) for guest RAM          close(shmem file)
>               |                                      |
>               |                                      |
>               V                                      |
>         wait for ready from daemon <----pipe-----send ready message
>               |                                      |
>               |                                 Here the daemon takes over
>         send ok------------pipe---------------> the owner of the socket
>               |                                 to the source
>               V                                      |
>         entering post copy stage                     |
>         start guest execution                        |
>               |                                      |
>               V                                      V
>         access guest RAM                          read() to get faulted
> pages
>               |                                      |
>               V                                      V
>         page fault ------------------------------>page offset is returned
>         block                                        |
>                                                      V
>                                                   pull page from the source
>                                                   write the page contents
>                                                   to the shmem.
>                                                      |
>                                                      V
>         unblock     <-----------------------------write() to tell served
> pages
>         the fault handler returns the page           |
>         page fault is resolved                       |
>               |                                      V
>               |                                   touch guest RAM pages
>               |                                      |
>               |                                      V
>               |                                   release the cached page
>               |                                   madvise(MADV_REMOVE)
>               |
>               |
>               |                                   pages can be sent
>               |                                   backgroundly
>               |                                      |
>               |                                      V
>               |                                   mark page is cached
>               |                                   Thus future page fault is
>               |                                   avoided.
>               |                                      |
>               |                                      V
>               |                                   touch guest RAM pages
>               |                                      |
>               |                                      V
>               |                                   release the cached page
>               |                                   madvise(MADV_REMOVE)
>               |                                      |
>               V                                      V
>
>                  all the pages are pulled from the source
>
>               |                                      |
>               V                                      V
>         migration completes                        exit()
>
>
> Isaku Yamahata (32):
>   migration.c: remove redundant line in migrate_init()
>   arch_init: DPRINTF format error and typo
>   osdep: add qemu_read_full() to read interrupt-safely
>   savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip,
>     qemu_fflush
>   savevm/QEMUFile: consolidate QEMUFile functions a bit
>   savevm/QEMUFile: introduce qemu_fopen_fd
>   savevm/QEMUFile: add read/write QEMUFile on memory buffer
>   savevm, buffered_file: introduce method to drain buffer of buffered
>     file
>   arch_init: export RAM_SAVE_xxx flags for postcopy
>   arch_init/ram_save: introduce constant for ram save version = 4
>   arch_init: refactor ram_save_block() and export ram_save_block()
>   arch_init/ram_save_setup: factor out bitmap alloc/free
>   arch_init/ram_load: refactor ram_load
>   arch_init: factor out logic to find ram block with id string
>   migration: export migrate_fd_completed() and migrate_fd_cleanup()
>   uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh
>   osdep: add QEMU_MADV_REMOVE and tirivial fix
>   postcopy: introduce helper functions for postcopy
>   savevm: add new section that is used by postcopy
>   postcopy: implement incoming part of postcopy live migration
>   postcopy outgoing: add -p option to migrate command
>   postcopy: implement outgoing part of postcopy live migration
>   postcopy/outgoing: add -n options to disable background transfer
>   postcopy/outgoing: implement forward/backword prefault
>   arch_init: factor out setting last_block, last_offset
>   postcopy/outgoing: add movebg mode(-m) to migration command
>   arch_init: factor out ram_load
>   arch_init: export ram_save_iterate()
>   postcopy: pre+post optimization incoming side
>   arch_init: export migration_bitmap_sync and helper method to get
>     bitmap
>   postcopy/outgoing: introduce precopy_count parameter
>   postcopy: pre+post optimization outgoing side
>
> Paolo Bonzini (1):
>   split MRU ram list
>
> Umesh Deshpande (2):
>   add a version number to ram_list
>   protect the ramlist with a separate mutex
>
>  Makefile.target                 |    2 +
>  arch_init.c                     |  391 +++++---
>  arch_init.h                     |   24 +
>  buffered_file.c                 |   59 +-
>  buffered_file.h                 |    1 +
>  cpu-all.h                       |   16 +-
>  exec.c                          |   62 +-
>  hmp-commands.hx                 |   21 +-
>  hmp.c                           |   12 +-
>  linux-headers/linux/uvmem.h     |   41 +
>  migration-exec.c                |    8 +-
>  migration-fd.c                  |   23 +-
>  migration-postcopy.c            | 2019
> +++++++++++++++++++++++++++++++++++++++
>  migration-tcp.c                 |   16 +-
>  migration-unix.c                |   36 +-
>  migration.c                     |   65 +-
>  migration.h                     |   42 +
>  osdep.c                         |   24 +
>  osdep.h                         |   13 +-
>  qapi-schema.json                |    6 +-
>  qemu-common.h                   |    2 +
>  qemu-file.h                     |   12 +-
>  qmp-commands.hx                 |    4 +-
>  savevm.c                        |  223 ++++-
>  scripts/update-linux-headers.sh |    2 +-
>  sysemu.h                        |    2 +-
>  umem.c                          |  291 ++++++
>  umem.h                          |   88 ++
>  vl.c                            |    5 +-
>  29 files changed, 3265 insertions(+), 245 deletions(-)
>  create mode 100644 linux-headers/linux/uvmem.h
>  create mode 100644 migration-postcopy.c
>  create mode 100644 umem.c
>  create mode 100644 umem.h
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
" The production of too many useful things results in too many useless
people"

[-- Attachment #2: Type: text/html, Size: 15719 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (35 preceding siblings ...)
  2012-10-30 18:53 ` [Qemu-devel] [PATCH v3 00/35] postcopy live migration Benoit Hudzia
@ 2012-10-30 18:55 ` Benoit Hudzia
  2012-11-06 11:04 ` Orit Wasserman
  37 siblings, 0 replies; 47+ messages in thread
From: Benoit Hudzia @ 2012-10-30 18:55 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: aarcange, aliguori, kvm, quintela, stefanha, t.hirofuchi, dlaor,
	satoshi.itoh, qemu-devel, mdroth, yoshikawa.takuya, owasserm,
	Hudzia, Benoit, avi, Shribman, Aidan, steve.walsh, pbonzini,
	chegu_vinod

Hi Isaku,


Are you going to be at the KVM forum ( i think you have a presentation
there). It would be nice if we could meet in order to see if we can
synch our efforts .

As you know we have been developing an RDMA based solution for post
copy migration and  we demonstrated the initial proof of concept in
december 2012 ( we published some finding  in VHPC 2012 and are
working with Petter Svard from Umea on a journal paper with more
detailed performance review) .

While  RDMA post copy live migration is just of by product of our long
term effort ( i will present the project  in my talk at KVM forum)  we
grabbed the opportunity  to address problems we were facing with the
live migration of enterprise workload . Namely how to migrate in
memory database such has HANA under load.

We quickly discovered that pre copy ( even with optimization ) didn't
work with such workload. We also tried your code however the
performance where far from satisfying with large VM under load due to
the heavy cost of transferring memory between user space - kernel
multiple time ( actually it often failed)

We then tested a   pure RDMA solution we developed  ( we suport HW and
software RDMA )   and it work fine with all the  workload we tested  (
we migrated VM with 20+ GB running SAP HANA under a workload similar
to TPC-H) and we hop to test with bigger configuration soon ( 1/2 + TB
of memory) .

However the state of integration of our code with the QEMU -code base
is not as advanced and polished as the one you currently have and i
would like to know if you would be interested in trying to join our
effort or collaborate in merging our solution. Or maybe allowing us to
piggy back on your effort.

Would you bee free to meet at any time next week ? ( from Tuesday to Friday)

Ps: we would be open sourcing our project by the end of the month of
November and the post copy is only a small part of the technology
developed..


Regards
Benoit


On 30 October 2012 08:32, Isaku Yamahata <yamahata@valinux.co.jp> wrote:
>
> This is the v3 patch series of postcopy migration.
>
> The trees is available at
> git://github.com/yamahata/qemu.git qemu-postcopy-oct-30-2012
> git://github.com/yamahata/linux-umem.git linux-umem-oct-29-2012
>
> Major changes v2 -> v3:
> - implemented pre+post optimization
> - auto detection of postcopy by incoming side
> - using threads on destination instead of fork
> - using blocking io instead of select + non-blocking io loop
> - less memory overhead
> - various improvement and code simplification
> - kernel module name change umem -> uvmem to avoid name conflict.
>
> Patches organization:
> 1-2: trivial fixes
> 3-5: prepartion for threading. cherry-picked from migration tree
> 6-18: refactoring existing code and preparation
> 19-25: implement postcopy live migration itself (essential part)
> 26-35: optimization/heuristic for postcopy
>
> Usage
> =====
> You need load uvmem character device on the host before starting
> migration.
> Postcopy can be used for tcg and kvm accelarator. The implementation
> depend
> on only linux uvmem character device. But the driver dependent code is
> split
> into a file.
> I tested only host page size == guest page size case, but the
> implementation
> allows host page size != guest page size case.
>
> The following options are added with this patch series.
> - incoming part
>   use -incoming as usual. Postcopy is automatically detected.
>   example:
>   qemu -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>
> - outging part
>   options for migrate command
>   migrate [-p [-n] [-m]] URI
>           [<precopy count> [<prefault forward> [<prefault backword>]]]
>
>   Newly added options/arguments
>   -p: indicate postcopy migration
>   -n: disable background transferring pages: This is for
> benchmark/debugging
>   -m: move background transfer of postcopy mode
>   <precopy count>: The number of precopy RAM scan before postcopy.
>                    default 0 (0 means no precopy)
>   <prefault forward>: The number of forward pages which is sent with
> on-demand
>   <prefault backward>: The number of backward pages which is sent with
>                        on-demand
>
>   example:
>   migrate -p -n tcp:<dest ip address>:4444
>   migrate -p -n -m tcp:<dest ip address>:4444 42 42 0
>
>
> TODO
> ====
> - benchmark/evaluation
> - improve/optimization
>   At the moment at least what I'm aware of is
>   - pre+post case
>     On desitnation side reading dirty bitmap would cause long latency.
>     create thread for that.
> - consider on FUSE/CUSE possibility
>
> basic postcopy work flow
> ========================
>         qemu on the destination
>               |
>               V
>         open(/dev/uvmem)
>               |
>               V
>         UVMEM_INIT
>               |
>               V
>         Here we have two file descriptors to
>         umem device and shmem file
>               |
>               |                                  umem threads
>               |                                  on the destination
>               |
>               V    create pipe to communicate
>         crete threads--------------------------------,
>               |                                      |
>               V                                   mmap(shmem file)
>         mmap(uvmem device) for guest RAM          close(shmem file)
>               |                                      |
>               |                                      |
>               V                                      |
>         wait for ready from daemon <----pipe-----send ready message
>               |                                      |
>               |                                 Here the daemon takes over
>         send ok------------pipe---------------> the owner of the socket
>               |                                 to the source
>               V                                      |
>         entering post copy stage                     |
>         start guest execution                        |
>               |                                      |
>               V                                      V
>         access guest RAM                          read() to get faulted
> pages
>               |                                      |
>               V                                      V
>         page fault ------------------------------>page offset is returned
>         block                                        |
>                                                      V
>                                                   pull page from the
> source
>                                                   write the page contents
>                                                   to the shmem.
>                                                      |
>                                                      V
>         unblock     <-----------------------------write() to tell served
> pages
>         the fault handler returns the page           |
>         page fault is resolved                       |
>               |                                      V
>               |                                   touch guest RAM pages
>               |                                      |
>               |                                      V
>               |                                   release the cached page
>               |                                   madvise(MADV_REMOVE)
>               |
>               |
>               |                                   pages can be sent
>               |                                   backgroundly
>               |                                      |
>               |                                      V
>               |                                   mark page is cached
>               |                                   Thus future page fault
> is
>               |                                   avoided.
>               |                                      |
>               |                                      V
>               |                                   touch guest RAM pages
>               |                                      |
>               |                                      V
>               |                                   release the cached page
>               |                                   madvise(MADV_REMOVE)
>               |                                      |
>               V                                      V
>
>                  all the pages are pulled from the source
>
>               |                                      |
>               V                                      V
>         migration completes                        exit()
>
>
> Isaku Yamahata (32):
>   migration.c: remove redundant line in migrate_init()
>   arch_init: DPRINTF format error and typo
>   osdep: add qemu_read_full() to read interrupt-safely
>   savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip,
>     qemu_fflush
>   savevm/QEMUFile: consolidate QEMUFile functions a bit
>   savevm/QEMUFile: introduce qemu_fopen_fd
>   savevm/QEMUFile: add read/write QEMUFile on memory buffer
>   savevm, buffered_file: introduce method to drain buffer of buffered
>     file
>   arch_init: export RAM_SAVE_xxx flags for postcopy
>   arch_init/ram_save: introduce constant for ram save version = 4
>   arch_init: refactor ram_save_block() and export ram_save_block()
>   arch_init/ram_save_setup: factor out bitmap alloc/free
>   arch_init/ram_load: refactor ram_load
>   arch_init: factor out logic to find ram block with id string
>   migration: export migrate_fd_completed() and migrate_fd_cleanup()
>   uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh
>   osdep: add QEMU_MADV_REMOVE and tirivial fix
>   postcopy: introduce helper functions for postcopy
>   savevm: add new section that is used by postcopy
>   postcopy: implement incoming part of postcopy live migration
>   postcopy outgoing: add -p option to migrate command
>   postcopy: implement outgoing part of postcopy live migration
>   postcopy/outgoing: add -n options to disable background transfer
>   postcopy/outgoing: implement forward/backword prefault
>   arch_init: factor out setting last_block, last_offset
>   postcopy/outgoing: add movebg mode(-m) to migration command
>   arch_init: factor out ram_load
>   arch_init: export ram_save_iterate()
>   postcopy: pre+post optimization incoming side
>   arch_init: export migration_bitmap_sync and helper method to get
>     bitmap
>   postcopy/outgoing: introduce precopy_count parameter
>   postcopy: pre+post optimization outgoing side
>
> Paolo Bonzini (1):
>   split MRU ram list
>
> Umesh Deshpande (2):
>   add a version number to ram_list
>   protect the ramlist with a separate mutex
>
>  Makefile.target                 |    2 +
>  arch_init.c                     |  391 +++++---
>  arch_init.h                     |   24 +
>  buffered_file.c                 |   59 +-
>  buffered_file.h                 |    1 +
>  cpu-all.h                       |   16 +-
>  exec.c                          |   62 +-
>  hmp-commands.hx                 |   21 +-
>  hmp.c                           |   12 +-
>  linux-headers/linux/uvmem.h     |   41 +
>  migration-exec.c                |    8 +-
>  migration-fd.c                  |   23 +-
>  migration-postcopy.c            | 2019
> +++++++++++++++++++++++++++++++++++++++
>  migration-tcp.c                 |   16 +-
>  migration-unix.c                |   36 +-
>  migration.c                     |   65 +-
>  migration.h                     |   42 +
>  osdep.c                         |   24 +
>  osdep.h                         |   13 +-
>  qapi-schema.json                |    6 +-
>  qemu-common.h                   |    2 +
>  qemu-file.h                     |   12 +-
>  qmp-commands.hx                 |    4 +-
>  savevm.c                        |  223 ++++-
>  scripts/update-linux-headers.sh |    2 +-
>  sysemu.h                        |    2 +-
>  umem.c                          |  291 ++++++
>  umem.h                          |   88 ++
>  vl.c                            |    5 +-
>  29 files changed, 3265 insertions(+), 245 deletions(-)
>  create mode 100644 linux-headers/linux/uvmem.h
>  create mode 100644 migration-postcopy.c
>  create mode 100644 umem.c
>  create mode 100644 umem.h
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
" The production of too many useful things results in too many useless
people"

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration
  2012-10-30 18:53 ` [Qemu-devel] [PATCH v3 00/35] postcopy live migration Benoit Hudzia
@ 2012-10-31  3:25   ` Isaku Yamahata
  0 siblings, 0 replies; 47+ messages in thread
From: Isaku Yamahata @ 2012-10-31  3:25 UTC (permalink / raw)
  To: Benoit Hudzia
  Cc: aarcange, aliguori, kvm, quintela, stefanha, t.hirofuchi, dlaor,
	satoshi.itoh, qemu-devel, Petter Svärd, mdroth,
	yoshikawa.takuya, owasserm, Hudzia, Benoit, avi, pbonzini,
	steve.walsh, chegu_vinod

On Tue, Oct 30, 2012 at 06:53:31PM +0000, Benoit Hudzia wrote:
> Hi Isaku,
> 
> 
> Are you going to be at the KVM forum ( i think you have a presentation there).
> It would be nice if we could meet in order to see if we can synch our efforts .

Yes, definitively.

> As you know we have been developing an RDMA based solution for post copy
> migration and  we demonstrated the initial proof of concept in december 2012 (
> we published some finding  in VHPC 2012 and are working with Petter Svard from
> Umea on a journal paper with more detailed performance review) .

Do you have any pointers to available papers/slides?
I can't find any at http://vhpc.org/


> While  RDMA post copy live migration is just of by product of our long term
> effort ( i will present the project  in my talk at KVM forum)  we grabbed the
> opportunity  to address problems we were facing with the live migration of
> enterprise workload . Namely how to migrate in memory database such has HANA
> under load.
> 
> We quickly discovered that pre copy ( even with optimization ) didn't work with
> such workload. We also tried your code however the performance where far from
> satisfying with large VM under load due to the heavy cost of transferring
> memory between user space - kernel multiple time ( actually it often failed)

If possible, I'd like to see the details.


> We then tested a   pure RDMA solution we developed  ( we suport HW and software
> RDMA )   and it work fine with all the  workload we tested  ( we migrated VM
> with 20+ GB running SAP HANA under a workload similar to TPC-H) and we hop to
> test with bigger configuration soon ( 1/2 + TB of memory) .
> 
> However the state of integration of our code with the QEMU -code base is not as
> advanced and polished as the one you currently have and i would like to know if
> you would be interested in trying to join our effort or collaborate in merging
> our solution. Or maybe allowing us to piggy back on your effort.

Yeah, we can unite our efforts for the upstream.
Especially clean interface for both non-RDMA/RDMA (qemu internal/qemu-kernel)
is important.
At the moment I have no clue to the requirement of RDMA postcopy and
your implementation.
"transparently integrating with the MMU at the OS level" sounds interesting.

thanks,

> Would you bee free to meet at any time next week ? ( from Tuesday to Friday)
> 
> Ps: we would be open sourcing our project by the end of the month of November
> and the post copy is only a small part of the technology developed.
> 
> .
> 
> 
> Regards
> Benoit
> 
> 
> On 30 October 2012 08:32, Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> 
>     This is the v3 patch series of postcopy migration.
> 
>     The trees is available at
>     git://github.com/yamahata/qemu.git qemu-postcopy-oct-30-2012
>     git://github.com/yamahata/linux-umem.git linux-umem-oct-29-2012
> 
>     Major changes v2 -> v3:
>     - implemented pre+post optimization
>     - auto detection of postcopy by incoming side
>     - using threads on destination instead of fork
>     - using blocking io instead of select + non-blocking io loop
>     - less memory overhead
>     - various improvement and code simplification
>     - kernel module name change umem -> uvmem to avoid name conflict.
> 
>     Patches organization:
>     1-2: trivial fixes
>     3-5: prepartion for threading. cherry-picked from migration tree
>     6-18: refactoring existing code and preparation
>     19-25: implement postcopy live migration itself (essential part)
>     26-35: optimization/heuristic for postcopy
> 
>     Usage
>     =====
>     You need load uvmem character device on the host before starting migration.
>     Postcopy can be used for tcg and kvm accelarator. The implementation depend
>     on only linux uvmem character device. But the driver dependent code is
>     split
>     into a file.
>     I tested only host page size == guest page size case, but the
>     implementation
>     allows host page size != guest page size case.
> 
>     The following options are added with this patch series.
>     - incoming part
>       use -incoming as usual. Postcopy is automatically detected.
>       example:
>       qemu -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
> 
>     - outging part
>       options for migrate command
>       migrate [-p [-n] [-m]] URI
>               [<precopy count> [<prefault forward> [<prefault backword>]]]
> 
>       Newly added options/arguments
>       -p: indicate postcopy migration
>       -n: disable background transferring pages: This is for benchmark/
>     debugging
>       -m: move background transfer of postcopy mode
>       <precopy count>: The number of precopy RAM scan before postcopy.
>                        default 0 (0 means no precopy)
>       <prefault forward>: The number of forward pages which is sent with
>     on-demand
>       <prefault backward>: The number of backward pages which is sent with
>                            on-demand
> 
>       example:
>       migrate -p -n tcp:<dest ip address>:4444
>       migrate -p -n -m tcp:<dest ip address>:4444 42 42 0
> 
> 
>     TODO
>     ====
>     - benchmark/evaluation
>     - improve/optimization
>       At the moment at least what I'm aware of is
>       - pre+post case
>         On desitnation side reading dirty bitmap would cause long latency.
>         create thread for that.
>     - consider on FUSE/CUSE possibility
> 
>     basic postcopy work flow
>     ========================
>             qemu on the destination
>                   |
>                   V
>             open(/dev/uvmem)
>                   |
>                   V
>             UVMEM_INIT
>                   |
>                   V
>             Here we have two file descriptors to
>             umem device and shmem file
>                   |
>                   |                                  umem threads
>                   |                                  on the destination
>                   |
>                   V    create pipe to communicate
>             crete threads--------------------------------,
>                   |                                      |
>                   V                                   mmap(shmem file)
>             mmap(uvmem device) for guest RAM          close(shmem file)
>                   |                                      |
>                   |                                      |
>                   V                                      |
>             wait for ready from daemon <----pipe-----send ready message
>                   |                                      |
>                   |                                 Here the daemon takes over
>             send ok------------pipe---------------> the owner of the socket
>                   |                                 to the source
>                   V                                      |
>             entering post copy stage                     |
>             start guest execution                        |
>                   |                                      |
>                   V                                      V
>             access guest RAM                          read() to get faulted
>     pages
>                   |                                      |
>                   V                                      V
>             page fault ------------------------------>page offset is returned
>             block                                        |
>                                                          V
>                                                       pull page from the source
>                                                       write the page contents
>                                                       to the shmem.
>                                                          |
>                                                          V
>             unblock     <-----------------------------write() to tell served
>     pages
>             the fault handler returns the page           |
>             page fault is resolved                       |
>                   |                                      V
>                   |                                   touch guest RAM pages
>                   |                                      |
>                   |                                      V
>                   |                                   release the cached page
>                   |                                   madvise(MADV_REMOVE)
>                   |
>                   |
>                   |                                   pages can be sent
>                   |                                   backgroundly
>                   |                                      |
>                   |                                      V
>                   |                                   mark page is cached
>                   |                                   Thus future page fault is
>                   |                                   avoided.
>                   |                                      |
>                   |                                      V
>                   |                                   touch guest RAM pages
>                   |                                      |
>                   |                                      V
>                   |                                   release the cached page
>                   |                                   madvise(MADV_REMOVE)
>                   |                                      |
>                   V                                      V
> 
>                      all the pages are pulled from the source
> 
>                   |                                      |
>                   V                                      V
>             migration completes                        exit()
> 
> 
>     Isaku Yamahata (32):
>       migration.c: remove redundant line in migrate_init()
>       arch_init: DPRINTF format error and typo
>       osdep: add qemu_read_full() to read interrupt-safely
>       savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip,
>         qemu_fflush
>       savevm/QEMUFile: consolidate QEMUFile functions a bit
>       savevm/QEMUFile: introduce qemu_fopen_fd
>       savevm/QEMUFile: add read/write QEMUFile on memory buffer
>       savevm, buffered_file: introduce method to drain buffer of buffered
>         file
>       arch_init: export RAM_SAVE_xxx flags for postcopy
>       arch_init/ram_save: introduce constant for ram save version = 4
>       arch_init: refactor ram_save_block() and export ram_save_block()
>       arch_init/ram_save_setup: factor out bitmap alloc/free
>       arch_init/ram_load: refactor ram_load
>       arch_init: factor out logic to find ram block with id string
>       migration: export migrate_fd_completed() and migrate_fd_cleanup()
>       uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh
>       osdep: add QEMU_MADV_REMOVE and tirivial fix
>       postcopy: introduce helper functions for postcopy
>       savevm: add new section that is used by postcopy
>       postcopy: implement incoming part of postcopy live migration
>       postcopy outgoing: add -p option to migrate command
>       postcopy: implement outgoing part of postcopy live migration
>       postcopy/outgoing: add -n options to disable background transfer
>       postcopy/outgoing: implement forward/backword prefault
>       arch_init: factor out setting last_block, last_offset
>       postcopy/outgoing: add movebg mode(-m) to migration command
>       arch_init: factor out ram_load
>       arch_init: export ram_save_iterate()
>       postcopy: pre+post optimization incoming side
>       arch_init: export migration_bitmap_sync and helper method to get
>         bitmap
>       postcopy/outgoing: introduce precopy_count parameter
>       postcopy: pre+post optimization outgoing side
> 
>     Paolo Bonzini (1):
>       split MRU ram list
> 
>     Umesh Deshpande (2):
>       add a version number to ram_list
>       protect the ramlist with a separate mutex
> 
>      Makefile.target                 |    2 +
>      arch_init.c                     |  391 +++++---
>      arch_init.h                     |   24 +
>      buffered_file.c                 |   59 +-
>      buffered_file.h                 |    1 +
>      cpu-all.h                       |   16 +-
>      exec.c                          |   62 +-
>      hmp-commands.hx                 |   21 +-
>      hmp.c                           |   12 +-
>      linux-headers/linux/uvmem.h     |   41 +
>      migration-exec.c                |    8 +-
>      migration-fd.c                  |   23 +-
>      migration-postcopy.c            | 2019
>     +++++++++++++++++++++++++++++++++++++++
>      migration-tcp.c                 |   16 +-
>      migration-unix.c                |   36 +-
>      migration.c                     |   65 +-
>      migration.h                     |   42 +
>      osdep.c                         |   24 +
>      osdep.h                         |   13 +-
>      qapi-schema.json                |    6 +-
>      qemu-common.h                   |    2 +
>      qemu-file.h                     |   12 +-
>      qmp-commands.hx                 |    4 +-
>      savevm.c                        |  223 ++++-
>      scripts/update-linux-headers.sh |    2 +-
>      sysemu.h                        |    2 +-
>      umem.c                          |  291 ++++++
>      umem.h                          |   88 ++
>      vl.c                            |    5 +-
>      29 files changed, 3265 insertions(+), 245 deletions(-)
>      create mode 100644 linux-headers/linux/uvmem.h
>      create mode 100644 migration-postcopy.c
>      create mode 100644 umem.c
>      create mode 100644 umem.h
> 
>     --
>     1.7.10.4
> 
>     --
>     To unsubscribe from this list: send the line "unsubscribe kvm" in
>     the body of a message to majordomo@vger.kernel.org
>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> 
> --
> " The production of too many useful things results in too many useless people"

-- 
yamahata

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 24/35] postcopy outgoing: add -p option to migrate command
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 24/35] postcopy outgoing: add -p option to migrate command Isaku Yamahata
@ 2012-11-01 19:48   ` Eric Blake
  0 siblings, 0 replies; 47+ messages in thread
From: Eric Blake @ 2012-11-01 19:48 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, owasserm, avi, pbonzini, chegu_vinod

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

On 10/30/2012 02:33 AM, Isaku Yamahata wrote:
> Added -p option to migrate command for postcopy mode and
> introduce postcopy parameter for migration to indicate that postcopy mode
> is enabled.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

> diff --git a/qapi-schema.json b/qapi-schema.json
> index c615ee2..c969e5a 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2094,7 +2094,8 @@
>  # Since: 0.14.0
>  ##
>  { 'command': 'migrate',
> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
> +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
> +           '*postcopy': 'bool'} }

You should also document this new variable a few lines earlier,
something like:

# @postcopy: #optional if true, perform a postcopy migration
#            (since 1.3, default false)

Also, I have to wonder if this should go through
migrate-set-capabilities rather than adding a new field to 'migrate'.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 26/35] postcopy/outgoing: add -n options to disable background transfer
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 26/35] postcopy/outgoing: add -n options to disable background transfer Isaku Yamahata
@ 2012-11-01 19:56   ` Eric Blake
  0 siblings, 0 replies; 47+ messages in thread
From: Eric Blake @ 2012-11-01 19:56 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, owasserm, avi, pbonzini, chegu_vinod

[-- Attachment #1: Type: text/plain, Size: 830 bytes --]

On 10/30/2012 02:33 AM, Isaku Yamahata wrote:
> This is for benchmark purpose
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

> +++ b/qapi-schema.json
> @@ -2095,7 +2095,7 @@
>  ##
>  { 'command': 'migrate',
>    'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
> -           '*postcopy': 'bool'} }
> +           '*postcopy': 'bool', '*nobg': 'bool'} }

Again, document this option above, and mention that it was introduced in
1.3.  In QMP, we prefer easier-to-read strings, I would consider naming
it 'background' with a default of true, where you pass false to get the
new behavior, instead of 'nobg' with a default of false and causing
double-negative logic.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault Isaku Yamahata
@ 2012-11-01 20:10   ` Eric Blake
  2012-11-02  5:24     ` Isaku Yamahata
  0 siblings, 1 reply; 47+ messages in thread
From: Eric Blake @ 2012-11-01 20:10 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, owasserm, avi, pbonzini, chegu_vinod

[-- Attachment #1: Type: text/plain, Size: 1796 bytes --]

On 10/30/2012 02:33 AM, Isaku Yamahata wrote:
> When page is requested, send surrounding pages are also sent.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> ---
>  hmp-commands.hx      |   15 ++++++++-----
>  hmp.c                |    3 +++
>  migration-postcopy.c |   57 +++++++++++++++++++++++++++++++++++++++++++++-----
>  migration.c          |   20 ++++++++++++++++++
>  migration.h          |    2 ++
>  qapi-schema.json     |    3 ++-
>  6 files changed, 89 insertions(+), 11 deletions(-)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index b054760..5e2c77c 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -826,26 +826,31 @@ ETEXI
>  
>      {
>          .name       = "migrate",
> -        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
> -        .params     = "[-d] [-b] [-i] [-p [-n]] uri",
> +        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,"
> +	              "forward:i?,backward:i?",
> +        .params     = "[-d] [-b] [-i] [-p [-n] uri [forward] [backword]",

I don't care what we do to the 'migrate' HMP command, but for QMP...

> +++ b/qapi-schema.json
> @@ -2095,7 +2095,8 @@
>  ##
>  { 'command': 'migrate',
>    'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
> -           '*postcopy': 'bool', '*nobg': 'bool'} }
> +           '*postcopy': 'bool', '*nobg': 'bool',
> +           '*forward': 'int', '*backward': 'int'} }

Do we really want to be adding new options to migrate (and if so,
where's the documentation), or do we need a new monitor command similar
to migrate-set-capabilities or migrate-set-cache-size?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 29/35] postcopy/outgoing: add movebg mode(-m) to migration command
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 29/35] postcopy/outgoing: add movebg mode(-m) to migration command Isaku Yamahata
@ 2012-11-01 20:15   ` Eric Blake
  0 siblings, 0 replies; 47+ messages in thread
From: Eric Blake @ 2012-11-01 20:15 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, owasserm, avi, pbonzini, chegu_vinod

[-- Attachment #1: Type: text/plain, Size: 866 bytes --]

On 10/30/2012 02:33 AM, Isaku Yamahata wrote:
> When movebg mode is enabled, the point to send background page is set
> to the next page to on-demand page.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

> +	-m for migratoin with postcopy mode enabled with moving position

s/migratoin/migration/

> +++ b/qapi-schema.json
> @@ -2095,7 +2095,7 @@
>  ##
>  { 'command': 'migrate',
>    'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
> -           '*postcopy': 'bool', '*nobg': 'bool',
> +           '*postcopy': 'bool', '*movebg': 'bool', '*nobg': 'bool',

Another undocumented option, and one which might be better named
'move-background'.  Also another candidate for migrate-set-capabilities.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 34/35] postcopy/outgoing: introduce precopy_count parameter
  2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 34/35] postcopy/outgoing: introduce precopy_count parameter Isaku Yamahata
@ 2012-11-01 21:20   ` Eric Blake
  0 siblings, 0 replies; 47+ messages in thread
From: Eric Blake @ 2012-11-01 21:20 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, owasserm, avi, pbonzini, chegu_vinod

[-- Attachment #1: Type: text/plain, Size: 837 bytes --]

On 10/30/2012 02:33 AM, Isaku Yamahata wrote:
> Precopy with this loop number before postcopy mode.
> This will be implemented by the next patch.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> ---
> +++ b/qapi-schema.json
> @@ -2089,6 +2089,8 @@
>  # @detach: this argument exists only for compatibility reasons and
>  #          is ignored by QEMU
>  #
> +# @precopy_count: #optional the number of loops of precopy when postcopy
> +#

Finally, a documented addition; but still lacking (since 1.3)
designation.  Also, QMP prefers '-' over '_', so I would name it
precopy-count, if we even decide to attach it to 'migrate' instead of a
new command or enhancement to 'migrate-set-capabilities'.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault
  2012-11-01 20:10   ` Eric Blake
@ 2012-11-02  5:24     ` Isaku Yamahata
  2012-11-02 15:22       ` Eric Blake
  0 siblings, 1 reply; 47+ messages in thread
From: Isaku Yamahata @ 2012-11-02  5:24 UTC (permalink / raw)
  To: Eric Blake
  Cc: benoit.hudzia, aarcange, aliguori, kvm, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, owasserm, avi, pbonzini, chegu_vinod

On Thu, Nov 01, 2012 at 02:10:45PM -0600, Eric Blake wrote:
> On 10/30/2012 02:33 AM, Isaku Yamahata wrote:
> > When page is requested, send surrounding pages are also sent.
> > 
> > Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> > ---
> >  hmp-commands.hx      |   15 ++++++++-----
> >  hmp.c                |    3 +++
> >  migration-postcopy.c |   57 +++++++++++++++++++++++++++++++++++++++++++++-----
> >  migration.c          |   20 ++++++++++++++++++
> >  migration.h          |    2 ++
> >  qapi-schema.json     |    3 ++-
> >  6 files changed, 89 insertions(+), 11 deletions(-)
> > 
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > index b054760..5e2c77c 100644
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -826,26 +826,31 @@ ETEXI
> >  
> >      {
> >          .name       = "migrate",
> > -        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
> > -        .params     = "[-d] [-b] [-i] [-p [-n]] uri",
> > +        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,"
> > +	              "forward:i?,backward:i?",
> > +        .params     = "[-d] [-b] [-i] [-p [-n] uri [forward] [backword]",
> 
> I don't care what we do to the 'migrate' HMP command, but for QMP...
> 
> > +++ b/qapi-schema.json
> > @@ -2095,7 +2095,8 @@
> >  ##
> >  { 'command': 'migrate',
> >    'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
> > -           '*postcopy': 'bool', '*nobg': 'bool'} }
> > +           '*postcopy': 'bool', '*nobg': 'bool',
> > +           '*forward': 'int', '*backward': 'int'} }
> 
> Do we really want to be adding new options to migrate (and if so,
> where's the documentation), or do we need a new monitor command similar
> to migrate-set-capabilities or migrate-set-cache-size?

Okay, migrate-set-capabilities seems usable for boolean and scalable
for future extension.
On the other hand, migrate-set-cache-size takes only single integer
as arguments. So it doesn't seem usable without modification.
How about this?

{ 'type': 'MigrationParameters',
  'data': {'parameter': 'name': 'str', 'value': 'int' } }

{ 'command': 'migrate-set-parameters',
   'data': { 'parameters' ['MigrationParameters']}}


{ 'command': 'query-migrate-parameters',
  'returns': [['MigrationParameters']]}
-- 
yamahata

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault
  2012-11-02  5:24     ` Isaku Yamahata
@ 2012-11-02 15:22       ` Eric Blake
  0 siblings, 0 replies; 47+ messages in thread
From: Eric Blake @ 2012-11-02 15:22 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, owasserm, avi, pbonzini, chegu_vinod

[-- Attachment #1: Type: text/plain, Size: 1566 bytes --]

On 11/01/2012 11:24 PM, Isaku Yamahata wrote:
>>> +++ b/qapi-schema.json
>>> @@ -2095,7 +2095,8 @@
>>>  ##
>>>  { 'command': 'migrate',
>>>    'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
>>> -           '*postcopy': 'bool', '*nobg': 'bool'} }
>>> +           '*postcopy': 'bool', '*nobg': 'bool',
>>> +           '*forward': 'int', '*backward': 'int'} }
>>
>> Do we really want to be adding new options to migrate (and if so,
>> where's the documentation), or do we need a new monitor command similar
>> to migrate-set-capabilities or migrate-set-cache-size?
> 
> Okay, migrate-set-capabilities seems usable for boolean and scalable
> for future extension.
> On the other hand, migrate-set-cache-size takes only single integer
> as arguments. So it doesn't seem usable without modification.
> How about this?
> 
> { 'type': 'MigrationParameters',
>   'data': {'parameter': 'name': 'str', 'value': 'int' } }

More like:

{ 'enum': 'MigrationParameterName',
  'data': ['ParameterName'... ] }

{ 'type': 'MigrationParameter',
  'data': {'parameter': 'MigrationParameterName', 'value': 'int' } }

> 
> { 'command': 'migrate-set-parameters',
>    'data': { 'parameters' ['MigrationParameters']}}

Yes, this seems more extensible.

> 
> 
> { 'command': 'query-migrate-parameters',
>   'returns': [['MigrationParameters']]}

One layer too many of [], but yes, this also seems reasonable.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration
  2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
                   ` (36 preceding siblings ...)
  2012-10-30 18:55 ` Benoit Hudzia
@ 2012-11-06 11:04 ` Orit Wasserman
  37 siblings, 0 replies; 47+ messages in thread
From: Orit Wasserman @ 2012-11-06 11:04 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, avi, pbonzini, chegu_vinod

Hi,
I didn't have time yet to review in detail your patches,
but I have one general comment about the interface to activate postcopy.
As postcopy needs to be supported both by source and destination Qemu,
for those kind of features we have migration capabilities interface,
you can look at the XBZRLE patch series for more details.
So in order to activate postcopy the user will need to do:
"migrate_set_capabilites postcopy on" on source and destination Qemu before starting
the migration process.

Regards,
Orit 

On 10/30/2012 10:32 AM, Isaku Yamahata wrote:
> This is the v3 patch series of postcopy migration.
> 
> The trees is available at
> git://github.com/yamahata/qemu.git qemu-postcopy-oct-30-2012
> git://github.com/yamahata/linux-umem.git linux-umem-oct-29-2012
> 
> Major changes v2 -> v3:
> - implemented pre+post optimization
> - auto detection of postcopy by incoming side
> - using threads on destination instead of fork
> - using blocking io instead of select + non-blocking io loop
> - less memory overhead
> - various improvement and code simplification
> - kernel module name change umem -> uvmem to avoid name conflict.
> 
> Patches organization:
> 1-2: trivial fixes
> 3-5: prepartion for threading. cherry-picked from migration tree
> 6-18: refactoring existing code and preparation
> 19-25: implement postcopy live migration itself (essential part)
> 26-35: optimization/heuristic for postcopy
> 
> Usage
> =====
> You need load uvmem character device on the host before starting migration.
> Postcopy can be used for tcg and kvm accelarator. The implementation depend
> on only linux uvmem character device. But the driver dependent code is split
> into a file.
> I tested only host page size == guest page size case, but the implementation
> allows host page size != guest page size case.
> 
> The following options are added with this patch series.
> - incoming part
>   use -incoming as usual. Postcopy is automatically detected.
>   example:
>   qemu -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
> 
> - outging part
>   options for migrate command
>   migrate [-p [-n] [-m]] URI 
>           [<precopy count> [<prefault forward> [<prefault backword>]]]
> 
>   Newly added options/arguments
>   -p: indicate postcopy migration
>   -n: disable background transferring pages: This is for benchmark/debugging
>   -m: move background transfer of postcopy mode
>   <precopy count>: The number of precopy RAM scan before postcopy.
>                    default 0 (0 means no precopy)
>   <prefault forward>: The number of forward pages which is sent with on-demand
>   <prefault backward>: The number of backward pages which is sent with
>                        on-demand
> 
>   example:
>   migrate -p -n tcp:<dest ip address>:4444
>   migrate -p -n -m tcp:<dest ip address>:4444 42 42 0
> 
> 
> TODO
> ====
> - benchmark/evaluation
> - improve/optimization
>   At the moment at least what I'm aware of is
>   - pre+post case
>     On desitnation side reading dirty bitmap would cause long latency.
>     create thread for that.
> - consider on FUSE/CUSE possibility
> 
> basic postcopy work flow
> ========================
>         qemu on the destination
>               |
>               V
>         open(/dev/uvmem)
>               |
>               V
>         UVMEM_INIT
>               |
>               V
>         Here we have two file descriptors to
>         umem device and shmem file
>               |
>               |                                  umem threads
>               |                                  on the destination
>               |
>               V    create pipe to communicate
>         crete threads--------------------------------,
>               |                                      |
>               V                                   mmap(shmem file)
>         mmap(uvmem device) for guest RAM          close(shmem file)
>               |                                      |
>               |                                      |
>               V                                      |
>         wait for ready from daemon <----pipe-----send ready message
>               |                                      |
>               |                                 Here the daemon takes over
>         send ok------------pipe---------------> the owner of the socket
>               |				        to the source
>               V                                      |
>         entering post copy stage                     |
>         start guest execution                        |
>               |                                      |
>               V                                      V
>         access guest RAM                          read() to get faulted pages
>               |                                      |
>               V                                      V
>         page fault ------------------------------>page offset is returned
>         block                                        |
>                                                      V
>                                                   pull page from the source
>                                                   write the page contents
>                                                   to the shmem.
>                                                      |
>                                                      V
>         unblock     <-----------------------------write() to tell served pages
>         the fault handler returns the page           |
>         page fault is resolved                       |
>               |                                      V
>               |                                   touch guest RAM pages
>               |                                      |
>               |                                      V
>               |                                   release the cached page
>               |                                   madvise(MADV_REMOVE)
> 	      |
> 	      |
>               |                                   pages can be sent
>               |                                   backgroundly
>               |                                      |
>               |                                      V
>               |                                   mark page is cached
>               |                                   Thus future page fault is
>               |                                   avoided.
>               |                                      |
>               |                                      V
>               |                                   touch guest RAM pages
>               |                                      |
>               |                                      V
>               |                                   release the cached page
>               |                                   madvise(MADV_REMOVE)
>               |                                      |
>               V                                      V
> 
>                  all the pages are pulled from the source
> 
>               |                                      |
>               V                                      V
>         migration completes                        exit()
> 
> 
> Isaku Yamahata (32):
>   migration.c: remove redundant line in migrate_init()
>   arch_init: DPRINTF format error and typo
>   osdep: add qemu_read_full() to read interrupt-safely
>   savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip,
>     qemu_fflush
>   savevm/QEMUFile: consolidate QEMUFile functions a bit
>   savevm/QEMUFile: introduce qemu_fopen_fd
>   savevm/QEMUFile: add read/write QEMUFile on memory buffer
>   savevm, buffered_file: introduce method to drain buffer of buffered
>     file
>   arch_init: export RAM_SAVE_xxx flags for postcopy
>   arch_init/ram_save: introduce constant for ram save version = 4
>   arch_init: refactor ram_save_block() and export ram_save_block()
>   arch_init/ram_save_setup: factor out bitmap alloc/free
>   arch_init/ram_load: refactor ram_load
>   arch_init: factor out logic to find ram block with id string
>   migration: export migrate_fd_completed() and migrate_fd_cleanup()
>   uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh
>   osdep: add QEMU_MADV_REMOVE and tirivial fix
>   postcopy: introduce helper functions for postcopy
>   savevm: add new section that is used by postcopy
>   postcopy: implement incoming part of postcopy live migration
>   postcopy outgoing: add -p option to migrate command
>   postcopy: implement outgoing part of postcopy live migration
>   postcopy/outgoing: add -n options to disable background transfer
>   postcopy/outgoing: implement forward/backword prefault
>   arch_init: factor out setting last_block, last_offset
>   postcopy/outgoing: add movebg mode(-m) to migration command
>   arch_init: factor out ram_load
>   arch_init: export ram_save_iterate()
>   postcopy: pre+post optimization incoming side
>   arch_init: export migration_bitmap_sync and helper method to get
>     bitmap
>   postcopy/outgoing: introduce precopy_count parameter
>   postcopy: pre+post optimization outgoing side
> 
> Paolo Bonzini (1):
>   split MRU ram list
> 
> Umesh Deshpande (2):
>   add a version number to ram_list
>   protect the ramlist with a separate mutex
> 
>  Makefile.target                 |    2 +
>  arch_init.c                     |  391 +++++---
>  arch_init.h                     |   24 +
>  buffered_file.c                 |   59 +-
>  buffered_file.h                 |    1 +
>  cpu-all.h                       |   16 +-
>  exec.c                          |   62 +-
>  hmp-commands.hx                 |   21 +-
>  hmp.c                           |   12 +-
>  linux-headers/linux/uvmem.h     |   41 +
>  migration-exec.c                |    8 +-
>  migration-fd.c                  |   23 +-
>  migration-postcopy.c            | 2019 +++++++++++++++++++++++++++++++++++++++
>  migration-tcp.c                 |   16 +-
>  migration-unix.c                |   36 +-
>  migration.c                     |   65 +-
>  migration.h                     |   42 +
>  osdep.c                         |   24 +
>  osdep.h                         |   13 +-
>  qapi-schema.json                |    6 +-
>  qemu-common.h                   |    2 +
>  qemu-file.h                     |   12 +-
>  qmp-commands.hx                 |    4 +-
>  savevm.c                        |  223 ++++-
>  scripts/update-linux-headers.sh |    2 +-
>  sysemu.h                        |    2 +-
>  umem.c                          |  291 ++++++
>  umem.h                          |   88 ++
>  vl.c                            |    5 +-
>  29 files changed, 3265 insertions(+), 245 deletions(-)
>  create mode 100644 linux-headers/linux/uvmem.h
>  create mode 100644 migration-postcopy.c
>  create mode 100644 umem.c
>  create mode 100644 umem.h
> 
> --
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2012-11-06 11:04 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-30  8:32 [Qemu-devel] [PATCH v3 00/35] postcopy live migration Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 01/35] migration.c: remove redundant line in migrate_init() Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 02/35] arch_init: DPRINTF format error and typo Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 03/35] split MRU ram list Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 04/35] add a version number to ram_list Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 05/35] protect the ramlist with a separate mutex Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 06/35] osdep: add qemu_read_full() to read interrupt-safely Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 07/35] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip, qemu_fflush Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 08/35] savevm/QEMUFile: consolidate QEMUFile functions a bit Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 09/35] savevm/QEMUFile: introduce qemu_fopen_fd Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 10/35] savevm/QEMUFile: add read/write QEMUFile on memory buffer Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 11/35] savevm, buffered_file: introduce method to drain buffer of buffered file Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 12/35] arch_init: export RAM_SAVE_xxx flags for postcopy Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 13/35] arch_init/ram_save: introduce constant for ram save version = 4 Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 14/35] arch_init: refactor ram_save_block() and export ram_save_block() Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 15/35] arch_init/ram_save_setup: factor out bitmap alloc/free Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 16/35] arch_init/ram_load: refactor ram_load Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 17/35] arch_init: factor out logic to find ram block with id string Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 18/35] migration: export migrate_fd_completed() and migrate_fd_cleanup() Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 19/35] uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 20/35] osdep: add QEMU_MADV_REMOVE and tirivial fix Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 21/35] postcopy: introduce helper functions for postcopy Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 22/35] savevm: add new section that is used by postcopy Isaku Yamahata
2012-10-30  8:32 ` [Qemu-devel] [PATCH v3 23/35] postcopy: implement incoming part of postcopy live migration Isaku Yamahata
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 24/35] postcopy outgoing: add -p option to migrate command Isaku Yamahata
2012-11-01 19:48   ` Eric Blake
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 25/35] postcopy: implement outgoing part of postcopy live migration Isaku Yamahata
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 26/35] postcopy/outgoing: add -n options to disable background transfer Isaku Yamahata
2012-11-01 19:56   ` Eric Blake
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault Isaku Yamahata
2012-11-01 20:10   ` Eric Blake
2012-11-02  5:24     ` Isaku Yamahata
2012-11-02 15:22       ` Eric Blake
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 28/35] arch_init: factor out setting last_block, last_offset Isaku Yamahata
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 29/35] postcopy/outgoing: add movebg mode(-m) to migration command Isaku Yamahata
2012-11-01 20:15   ` Eric Blake
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 30/35] arch_init: factor out ram_load Isaku Yamahata
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 31/35] arch_init: export ram_save_iterate() Isaku Yamahata
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 32/35] postcopy: pre+post optimization incoming side Isaku Yamahata
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 33/35] arch_init: export migration_bitmap_sync and helper method to get bitmap Isaku Yamahata
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 34/35] postcopy/outgoing: introduce precopy_count parameter Isaku Yamahata
2012-11-01 21:20   ` Eric Blake
2012-10-30  8:33 ` [Qemu-devel] [PATCH v3 35/35] postcopy: pre+post optimization outgoing side Isaku Yamahata
2012-10-30 18:53 ` [Qemu-devel] [PATCH v3 00/35] postcopy live migration Benoit Hudzia
2012-10-31  3:25   ` Isaku Yamahata
2012-10-30 18:55 ` Benoit Hudzia
2012-11-06 11:04 ` Orit Wasserman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).