[Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app
@ 2012-05-22 12:56 Orit Wasserman
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 1/9] Add MigrationParams structure Orit Wasserman
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, Petter Svard, stefanha, mdroth,
	Benoit Hudzia, blauwirbel, Orit Wasserman, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Changes form v10:
	- Cache size will be in bytes, in case it is not a power of 2 it will be
	  reduced to the nearest power of 2.
	- fix documentation
	- use cache_init with number of pages not cache size.

Changes from v9:
	- move cache implementation to separate files. Kept our own implementation because GCache or GHashTable have no size limit.
	- Add migrate_set_parameter function
	- removed XBZRLE option from migrate command
	- add cache size information to query_migrate command
	- add documantation file
	- write/read the exact XBZRLE header format
	- fix other review comments by Anthony and Juan

Changes from v8:
	Implement more effiecent cache_resize method
	fix set_cachesize command 

Changes from v7:
	Copy current page before encoding it, this will prevents page content
	change during the encoding.
	Allow changing the cache size during an active migration.
	Fix comments by Avi.

Changes from v6:
 1) add assert checks to ULEB encoding/decoding
 2) no need to send last zero run
	
Changes from v5:
1) Add migration capabilities
2) Use ULEB to encode run length
3) Do not send unmodified (dirty) page
3) Fix other patch comments

Using GCache or GHashTable requires allocating new buffer on every content change and have no size limit ,
so I decided to keep the simple cache implementation.

Changes from v4:
1) Rebase
2) divide patch into 9 patches
3) move memory allocation into cache_insert

Future work :
     Use SSE for encoding.
     Page ranking acording to their dirty rate and automatic activation/deactivation of the feature - will be sent in a separate patch series.	

By using XBZRLE (Xor Based Zero Run Length Encoding) we can reduce VM downtime
and total live-migration time of VMs running memory write intensive workloads
typical of large enterprise applications such as SAP ERP Systems, and generally
speaking for any application with a sparse memory update pattern.

The compression format uses the fact that we will have many zero (zero represents
an unchanged value). 
We repesent the page data delta by zero and non zero runs.
We represent a zero run with it's length (in bytes). 
We represent a non zero run with it's length (in bytes) and the data.
The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)

page = zrun nzrun
       | zrun nzrun page

zrun = length

nzrun = length byte...

length = uleb128 encoded integer

On the sender side XBZRLE is used as a compact delta encoding of page updates,
retrieving the old page content from an LRU cache (default size of 512 MB). The
receiving side uses the existing page content and XBZRLE to decode the new page
content.

This is a more compact way to store the delta than the previous version.

This work was originally based on research results published VEE 2011: Evaluation of
Delta Compression Techniques for Efficient Live Migration of Large Virtual
Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta encoder
XBRLE was improved further using XBZRLE instead.

XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
ideal for in-line, real-time encoding such as is needed for live-migration.

A typical usage scenario:
    {qemu} migrate_set_cachesize 256m
    {qemu} migrate_set_parameter xbzrle
    {qemu} migrate -d tcp:destination.host:4444
    {qemu} info migrate
    ...
    transferred ram: A kbytes
    remaining ram: B kbytes
    total ram: C kbytes
    cache size: D bytes
    xbzrle transferred: E kbytes
    xbzrle pages: F pages
    xbzrle cache miss: G
    xbzrle overflow : H

Testing: live migration with XBZRLE completed in 110 seconds, without live
migration was not able to complete.

A simple synthetic memory r/w load generator:
..    include <stdlib.h>
..    include <stdio.h>
..    int main()
..    {
..        char *buf = (char *) calloc(4096, 4096);
..        while (1) {
..            int i;
..            for (i = 0; i < 4096 * 4; i++) {
..                buf[i * 4096 / 4]++;
..            }
..            printf(".");
..        }
..    }

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>

Orit Wasserman (9):
  Add MigrationParams structure
  Add migration capabilites
  Add XBZRLE documentation
  Add cache handling functions
  Add uleb encoding/decoding functions
  Add save_block_hdr function
  Add XBZRLE to ram_save_block and ram_save_live
  Add set_cachesize command
  Add XBZRLE statistics

 Makefile.objs        |    1 +
 arch_init.c          |  322 ++++++++++++++++++++++++++++++++++++++++++++++----
 block-migration.c    |    8 +-
 cache.c              |  219 ++++++++++++++++++++++++++++++++++
 cutils.c             |   29 +++++
 docs/xbzrle.txt      |  114 ++++++++++++++++++
 hmp-commands.hx      |   34 ++++++
 hmp.c                |   67 +++++++++++
 hmp.h                |    3 +
 include/qemu/cache.h |   81 +++++++++++++
 migration.c          |  127 +++++++++++++++++++--
 migration.h          |   29 ++++-
 monitor.c            |    7 +
 qapi-schema.json     |   87 +++++++++++++-
 qemu-common.h        |   19 +++
 qmp-commands.hx      |   98 +++++++++++++++
 savevm.c             |  102 +++++++++++++++-
 sysemu.h             |    3 +-
 vmstate.h            |    2 +-
 19 files changed, 1304 insertions(+), 48 deletions(-)
 create mode 100644 cache.c
 create mode 100644 docs/xbzrle.txt
 create mode 100644 include/qemu/cache.h

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v11 1/9] Add MigrationParams structure
  2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
@ 2012-05-22 12:56 ` Orit Wasserman
  2012-06-01 10:51   ` Juan Quintela
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 2/9] Add migration capabilites Orit Wasserman
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, stefanha, mdroth, blauwirbel,
	Orit Wasserman, chegu_vinod, avi, pbonzini, eblake,
	Isaku Yamahata

From: Isaku Yamahata <yamahata@valinux.co.jp>

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 block-migration.c |    8 ++++----
 migration.c       |   13 ++++++++-----
 migration.h       |    8 ++++++--
 qemu-common.h     |    1 +
 savevm.c          |   11 ++++++++---
 sysemu.h          |    3 ++-
 vmstate.h         |    2 +-
 7 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index fd2ffff..b95b4e1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -700,13 +700,13 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
     return 0;
 }
 
-static void block_set_params(int blk_enable, int shared_base, void *opaque)
+static void block_set_params(const MigrationParams *params, void *opaque)
 {
-    block_mig_state.blk_enable = blk_enable;
-    block_mig_state.shared_base = shared_base;
+    block_mig_state.blk_enable = params->blk;
+    block_mig_state.shared_base = params->shared;
 
     /* shared base means that blk_enable = 1 */
-    block_mig_state.blk_enable |= shared_base;
+    block_mig_state.blk_enable |= params->shared;
 }
 
 void blk_mig_init(void)
diff --git a/migration.c b/migration.c
index 3f485d3..810727f 100644
--- a/migration.c
+++ b/migration.c
@@ -352,7 +352,7 @@ void migrate_fd_connect(MigrationState *s)
                                       migrate_fd_close);
 
     DPRINTF("beginning savevm\n");
-    ret = qemu_savevm_state_begin(s->file, s->blk, s->shared);
+    ret = qemu_savevm_state_begin(s->file, &s->params);
     if (ret < 0) {
         DPRINTF("failed, %d\n", ret);
         migrate_fd_error(s);
@@ -361,15 +361,14 @@ void migrate_fd_connect(MigrationState *s)
     migrate_fd_put_ready(s);
 }
 
-static MigrationState *migrate_init(int blk, int inc)
+static MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
 
     memset(s, 0, sizeof(*s));
     s->bandwidth_limit = bandwidth_limit;
-    s->blk = blk;
-    s->shared = inc;
+    s->params = *params;
 
     s->bandwidth_limit = bandwidth_limit;
     s->state = MIG_STATE_SETUP;
@@ -394,9 +393,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  Error **errp)
 {
     MigrationState *s = migrate_get_current();
+    MigrationParams params;
     const char *p;
     int ret;
 
+    params.blk = blk;
+    params.shared = inc;
+
     if (s->state == MIG_STATE_ACTIVE) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -411,7 +414,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         return;
     }
 
-    s = migrate_init(blk, inc);
+    s = migrate_init(&params);
 
     if (strstart(uri, "tcp:", &p)) {
         ret = tcp_start_outgoing_migration(s, p, errp);
diff --git a/migration.h b/migration.h
index 2e9ca2e..4168883 100644
--- a/migration.h
+++ b/migration.h
@@ -19,6 +19,11 @@
 #include "notify.h"
 #include "error.h"
 
+struct MigrationParams {
+    int blk;
+    int shared;
+};
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -31,8 +36,7 @@ struct MigrationState
     int (*close)(MigrationState *s);
     int (*write)(MigrationState *s, const void *buff, size_t size);
     void *opaque;
-    int blk;
-    int shared;
+    MigrationParams params;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-common.h b/qemu-common.h
index cccfb42..231c012 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -17,6 +17,7 @@ typedef struct DeviceState DeviceState;
 
 struct Monitor;
 typedef struct Monitor Monitor;
+typedef struct MigrationParams MigrationParams;
 
 /* we put basic includes here to avoid repeating them in device drivers */
 #include <stdlib.h>
diff --git a/savevm.c b/savevm.c
index 2d18bab..dd66f2c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1561,7 +1561,8 @@ bool qemu_savevm_state_blocked(Error **errp)
     return false;
 }
 
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
+int qemu_savevm_state_begin(QEMUFile *f,
+                            const MigrationParams *params)
 {
     SaveStateEntry *se;
     int ret;
@@ -1570,7 +1571,7 @@ int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
         if(se->set_params == NULL) {
             continue;
 	}
-	se->set_params(blk_enable, shared, se->opaque);
+        se->set_params(params, se->opaque);
     }
     
     qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
@@ -1708,13 +1709,17 @@ void qemu_savevm_state_cancel(QEMUFile *f)
 static int qemu_savevm_state(QEMUFile *f)
 {
     int ret;
+    MigrationParams params = {
+        .blk = 0,
+        .shared = 0
+    };
 
     if (qemu_savevm_state_blocked(NULL)) {
         ret = -EINVAL;
         goto out;
     }
 
-    ret = qemu_savevm_state_begin(f, 0, 0);
+    ret = qemu_savevm_state_begin(f, &params);
     if (ret < 0)
         goto out;
 
diff --git a/sysemu.h b/sysemu.h
index bc2c788..6540c79 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -77,7 +77,8 @@ void do_info_snapshots(Monitor *mon);
 void qemu_announce_self(void);
 
 bool qemu_savevm_state_blocked(Error **errp);
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared);
+int qemu_savevm_state_begin(QEMUFile *f,
+                            const MigrationParams *params);
 int qemu_savevm_state_iterate(QEMUFile *f);
 int qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(QEMUFile *f);
diff --git a/vmstate.h b/vmstate.h
index 82d97ae..5af45e0 100644
--- a/vmstate.h
+++ b/vmstate.h
@@ -26,7 +26,7 @@
 #ifndef QEMU_VMSTATE_H
 #define QEMU_VMSTATE_H 1
 
-typedef void SaveSetParamsHandler(int blk_enable, int shared, void * opaque);
+typedef void SaveSetParamsHandler(const MigrationParams *params, void * opaque);
 typedef void SaveStateHandler(QEMUFile *f, void *opaque);
 typedef int SaveLiveStateHandler(QEMUFile *f, int stage, void *opaque);
 typedef int LoadStateHandler(QEMUFile *f, void *opaque, int version_id);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v11 2/9] Add migration capabilites
  2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 1/9] Add MigrationParams structure Orit Wasserman
@ 2012-05-22 12:56 ` Orit Wasserman
  2012-05-22 13:08   ` Eric Blake
  2012-06-01 10:57   ` Juan Quintela
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 3/9] Add XBZRLE documentation Orit Wasserman
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, stefanha, mdroth, blauwirbel,
	Orit Wasserman, chegu_vinod, avi, pbonzini, eblake

Add migration capabiltes that can be queried by the management.
The managment can query the source QEMU and the destination QEMU in order to
verify both support some  migration capability (currently only XBZRLE).
The managment can enable a capabilty for the next migration by using
migrate_set_parameter command.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 hmp-commands.hx  |   16 ++++++++++++++++
 hmp.c            |   41 +++++++++++++++++++++++++++++++++++++++++
 hmp.h            |    2 ++
 migration.c      |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 migration.h      |    2 ++
 monitor.c        |    7 +++++++
 qapi-schema.json |   46 +++++++++++++++++++++++++++++++++++++++++++++-
 qmp-commands.hx  |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 savevm.c         |    2 +-
 9 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 18cb415..e14e7be 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -861,6 +861,20 @@ Set maximum tolerated downtime (in seconds) for migration.
 ETEXI
 
     {
+        .name       = "migrate_set_parameter",
+        .args_type  = "parameter:s",
+        .params     = "parameter",
+        .help       = "Enable the usage of a capability for migration",
+        .mhandler.cmd = hmp_migrate_set_parameter,
+    },
+
+STEXI
+@item migrate_set_parameter @var{parameter}
+@findex migrate_set_parameter
+Enable the usage of a capability @var{parameter} for migration.
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
@@ -1393,6 +1407,8 @@ show CPU statistics
 show user network stack connection states
 @item info migrate
 show migration status
+@item info migration_capabilities
+show migration capabilities
 @item info balloon
 show balloon information
 @item info qtree
diff --git a/hmp.c b/hmp.c
index bb0952e..9582400 100644
--- a/hmp.c
+++ b/hmp.c
@@ -128,9 +128,18 @@ void hmp_info_mice(Monitor *mon)
 void hmp_info_migrate(Monitor *mon)
 {
     MigrationInfo *info;
+    MigrationCapabilityInfoList *cap;
 
     info = qmp_query_migrate(NULL);
 
+    if (info->has_params && info->params) {
+        monitor_printf(mon, "params: ");
+        for (cap = info->params; cap; cap = cap->next) {
+            monitor_printf(mon, "%s",
+                           MigrationCapability_lookup[cap->value->capability]);
+        }
+        monitor_printf(mon, "\n");
+    }
     if (info->has_status) {
         monitor_printf(mon, "Migration status: %s\n", info->status);
     }
@@ -156,6 +165,24 @@ void hmp_info_migrate(Monitor *mon)
     qapi_free_MigrationInfo(info);
 }
 
+void hmp_info_migration_capabilities(Monitor *mon)
+{
+    MigrationCapabilityInfoList *caps_list, *cap;
+
+    caps_list = qmp_query_migration_capabilities(NULL);
+    if (!caps_list) {
+        monitor_printf(mon, "No migration capabilities found\n");
+        return;
+    }
+
+    for (cap = caps_list; cap; cap = cap->next) {
+        monitor_printf(mon, "%s ",
+                       MigrationCapability_lookup[cap->value->capability]);
+    }
+
+    qapi_free_MigrationCapabilityInfoList(caps_list);
+}
+
 void hmp_info_cpus(Monitor *mon)
 {
     CpuInfoList *cpu_list, *cpu;
@@ -730,6 +757,20 @@ void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict)
     qmp_migrate_set_speed(value, NULL);
 }
 
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
+{
+    const char *value = qdict_get_str(qdict, "parameter");
+    Error *err = NULL;
+
+    qmp_migrate_set_parameter(value, &err);
+
+    if (err) {
+        monitor_printf(mon, "migrate_set_parameter: %s\n",
+                       error_get_pretty(err));
+        error_free(err);
+    }
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 443b812..5f9d842 100644
--- a/hmp.h
+++ b/hmp.h
@@ -25,6 +25,7 @@ void hmp_info_uuid(Monitor *mon);
 void hmp_info_chardev(Monitor *mon);
 void hmp_info_mice(Monitor *mon);
 void hmp_info_migrate(Monitor *mon);
+void hmp_info_migration_capabilities(Monitor *mon);
 void hmp_info_cpus(Monitor *mon);
 void hmp_info_block(Monitor *mon);
 void hmp_info_blockstats(Monitor *mon);
@@ -51,6 +52,7 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/migration.c b/migration.c
index 810727f..952f542 100644
--- a/migration.c
+++ b/migration.c
@@ -117,10 +117,22 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 {
     MigrationInfo *info = g_malloc0(sizeof(*info));
     MigrationState *s = migrate_get_current();
+    int i;
 
     switch (s->state) {
     case MIG_STATE_SETUP:
-        /* no migration has happened ever */
+        /* no migration has happened ever show enabled capabilities */
+        for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
+            if (s->enabled_capabilities[i]) {
+                if (!info->has_params) {
+                    info->params = g_malloc0(sizeof(*info->params));
+                    info->has_params = true;
+                }
+                info->params->value = g_malloc(sizeof(*info->params->value));
+                info->params->value->capability = i;
+                info->params->next = NULL;
+            }
+        }
         break;
     case MIG_STATE_ACTIVE:
         info->has_status = true;
@@ -157,6 +169,38 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     return info;
 }
 
+MigrationCapabilityInfoList *qmp_query_migration_capabilities(Error **errp)
+{
+    MigrationCapabilityInfoList *caps_list = g_malloc0(sizeof(*caps_list));
+
+    caps_list->value = g_malloc(sizeof(*caps_list->value));
+    caps_list->value->capability = MIGRATION_CAPABILITY_XBZRLE;
+    caps_list->next = NULL;
+
+    return caps_list;
+}
+
+
+void qmp_migrate_set_parameter(const char *parameter, Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+    int i;
+
+    if (s->state == MIG_STATE_ACTIVE) {
+        error_set(errp, QERR_MIGRATION_ACTIVE);
+        return;
+    }
+
+    for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
+        if (strcmp(parameter, MigrationCapability_lookup[i]) == 0) {
+            s->enabled_capabilities[i] = true;
+            return;
+        }
+    }
+
+    error_set(errp, QERR_INVALID_PARAMETER, parameter);
+}
+
 /* shared migration helpers */
 
 static int migrate_fd_cleanup(MigrationState *s)
@@ -365,12 +409,17 @@ static MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
+    bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
+
+    memcpy(enabled_capabilities, s->enabled_capabilities,
+           sizeof(enabled_capabilities));
 
     memset(s, 0, sizeof(*s));
     s->bandwidth_limit = bandwidth_limit;
     s->params = *params;
+    memcpy(s->enabled_capabilities, enabled_capabilities,
+           sizeof(enabled_capabilities));
 
-    s->bandwidth_limit = bandwidth_limit;
     s->state = MIG_STATE_SETUP;
 
     return s;
diff --git a/migration.h b/migration.h
index 4168883..00d1992 100644
--- a/migration.h
+++ b/migration.h
@@ -18,6 +18,7 @@
 #include "qemu-common.h"
 #include "notify.h"
 #include "error.h"
+#include "qapi-types.h"
 
 struct MigrationParams {
     int blk;
@@ -37,6 +38,7 @@ struct MigrationState
     int (*write)(MigrationState *s, const void *buff, size_t size);
     void *opaque;
     MigrationParams params;
+    bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/monitor.c b/monitor.c
index 12a6fe2..0233bc3 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2558,6 +2558,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.info = hmp_info_migrate,
     },
     {
+        .name       = "migration_capabilities",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show migration capabilities",
+        .mhandler.info = hmp_info_migration_capabilities,
+    },
+    {
         .name       = "balloon",
         .args_type  = "",
         .params     = "",
diff --git a/qapi-schema.json b/qapi-schema.json
index 2ca7195..2887c51 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -264,7 +264,7 @@
 ##
 { 'type': 'MigrationInfo',
   'data': {'*status': 'str', '*ram': 'MigrationStats',
-           '*disk': 'MigrationStats'} }
+           '*disk': 'MigrationStats', '*params': ['MigrationCapabilityInfo']} }
 
 ##
 # @query-migrate
@@ -278,6 +278,50 @@
 { 'command': 'query-migrate', 'returns': 'MigrationInfo' }
 
 ##
+# @MigrationCapability
+#
+# Migration capabilities enumaration
+#
+# @xbzrle: current migration supports xbzrle
+#
+# Since: 1.1
+##
+{ 'enum': 'MigrationCapability',
+  'data': ['xbzrle'] }
+
+##
+# @MigrationCapabilityInfo
+#
+# Migration capability information
+#
+# @capability: capability enum
+#
+# Since: 1.2
+##
+{ 'type': 'MigrationCapabilityInfo',
+  'data': { 'capability' : 'MigrationCapability'} }
+
+##
+# @query-migration-capabilities
+#
+# Returns information about current migration process capabilties.
+#
+# Returns: @MigrationCapabilityInfo list
+#
+# Since: 1.2
+##
+{ 'command': 'query-migration-capabilities', 'returns': ['MigrationCapabilityInfo'] }
+
+##
+# @migrate_set_parameter
+#
+# Set the following migration parameters (like xbzrle )
+##
+# Since: 1.2
+##
+{ 'command': 'migrate-set-parameter', 'data': { 'parameter': 'str' } }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index db980fa..7750f2f 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2068,6 +2068,53 @@ EQMP
     },
 
 SQMP
+query-migration-capabilities
+-------
+
+Query migration capabilities
+
+- "xbzrle": xbzrle support
+
+Arguments:
+
+Example:
+
+-> { "execute": "query-migration-capabilities"}
+<- { "return": { "xbzrle" }
+
+EQMP
+
+    {
+        .name       = "query-migration-capabilities",
+        .args_type  = "",
+	.mhandler.cmd_new = qmp_marshal_input_query_migration_capabilities,
+    },
+
+SQMP
+migrate_set_parameter
+-------
+
+Enable migration parameter
+
+- "xbzrle": xbzrle support
+
+Arguments:
+
+Example:
+
+-> { "execute": "migrate_set_parameter" , "arguments": { "parameter": xbzrle"} }
+
+EQMP
+
+    {
+        .name       = "migrate_set_parameter",
+        .args_type  = "parameter:s",
+	.mhandler.cmd_new = qmp_marshal_input_migrate_set_parameter,
+    },
+
+
+
+SQMP
 query-balloon
 -------------
 
diff --git a/savevm.c b/savevm.c
index dd66f2c..42937a0 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1711,7 +1711,7 @@ static int qemu_savevm_state(QEMUFile *f)
     int ret;
     MigrationParams params = {
         .blk = 0,
-        .shared = 0
+        .shared = 0,
     };
 
     if (qemu_savevm_state_blocked(NULL)) {
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v11 3/9] Add XBZRLE documentation
  2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 1/9] Add MigrationParams structure Orit Wasserman
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 2/9] Add migration capabilites Orit Wasserman
@ 2012-05-22 12:56 ` Orit Wasserman
  2012-05-22 13:13   ` Eric Blake
  2012-06-01 10:58   ` Juan Quintela
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 4/9] Add cache handling functions Orit Wasserman
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, stefanha, mdroth, blauwirbel,
	Orit Wasserman, chegu_vinod, avi, pbonzini, eblake

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 docs/xbzrle.txt |  114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 114 insertions(+), 0 deletions(-)
 create mode 100644 docs/xbzrle.txt

diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
new file mode 100644
index 0000000..16950d5
--- /dev/null
+++ b/docs/xbzrle.txt
@@ -0,0 +1,114 @@
+XBZRLE (Xor Based Zero Run Length Encoding)
+===========================================
+
+Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction of VM
+downtime and the total live-migration time of Virtual machines.
+It is particularly useful for virtual machines running memory write intensive
+workloads that are typical of large enterprise applications such as SAP ERP
+Systems, and generally speaking for any application that uses a sparse memory
+update pattern.
+
+Instead of sending the changed guest memory page this solution will send a
+compressed version of the updates, thus reducing the amount of data sent during
+live migration.
+In order to be able to calculate the update, the previous memory pages needed to
+be stored. Those pages are stored in a dedicated cache (hash table) and are
+accessed by their address.
+The larger the cache size the better the chances are that the page has already
+been stored in the cache.
+A small cache size will result in high cache miss rate.
+Cache size can be changed before and during migration.
+
+Format
+=======
+
+The compression format uses the zero value, where zero represents an unchanged
+value.
+The page data delta is represented by zero and non zero runs.
+A zero run is represented by it's length (in bytes).
+A non zero run is represented by it's length (in bytes) and the data.
+The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+
+page = zrun nzrun
+       | zrun nzrun page
+
+zrun = length
+
+nzrun = length byte...
+
+length = uleb128 encoded integer
+
+On the sender side XBZRLE is used as a compact delta encoding of page updates,
+retrieving the old page content from the cache (default size of 512 MB). The
+receiving side uses the existing page's content and XBZRLE to decode the new
+page's content.
+
+This is a more compact way to store the deltas than the previous version.
+
+This work was originally based on research results published 
+VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
+Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
+Additionally the delta encoder XBRLE was improved further using the XBZRLE
+instead.
+
+XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
+ideal for in-line, real-time encoding such as is needed for live-migration.
+
+Migration Capabilities
+======================
+In order to use XBZRLE the destination QEMU version should be able to
+decode the new format.
+Adding a new migration capabilities command that will allow external management
+to query for it support.
+A typical use for the destination
+    {qemu} info migrate_capabilities
+    {qemu} xbzrle, ...
+
+In order to enable capabilities for future live migration,
+a new command migrate_set_parameter is introduced:
+    {qemu} migrate_set_parameter xbzrle
+
+Usage
+======
+
+1. Activate xbzrle
+2. Set the XBZRLE cache size - the cache size is in MBytes and should be a
+power of 2. The cache default value is 64MBytes.
+3. start outgoing migration
+
+A typical usage scenario:
+    {qemu} migrate_set_parameter xbzrle
+    {qemu} migrate_set_cachesize 256m
+    {qemu} migrate -d tcp:destination.host:4444
+    {qemu} info migrate
+    ...
+    transferred ram-duplicate: A kbytes
+    transferred ram-normal: B kbytes
+    transferred ram-xbrle: C kbytes
+    overflow ram-xbrle: D pages
+    cache-miss ram-xbrle: E pages
+
+cache-miss: the number of cache misses to date - high cache-miss rate
+indicates that the cache size is set too low.
+overflow: the number of overflows in the decoding which where the delta could
+not be compressed. This can happen if the changes in the pages are too large
+or there are many short changes for example change every second byte (half a
+page).
+
+Testing: Testing indicated that live migration with XBZRLE was completed in 110
+seconds, whereas without it would not be able to complete.
+
+A simple synthetic memory r/w load generator:
+..    include <stdlib.h>
+..    include <stdio.h>
+..    int main()
+..    {
+..        char *buf = (char *) calloc(4096, 4096);
+..        while (1) {
+..            int i;
+..            for (i = 0; i < 4096 * 4; i++) {
+..                buf[i * 4096 / 4]++;
+..            }
+..            printf(".");
+..        }
+..    }
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v11 4/9] Add cache handling functions
  2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (2 preceding siblings ...)
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 3/9] Add XBZRLE documentation Orit Wasserman
@ 2012-05-22 12:57 ` Orit Wasserman
  2012-06-01 11:01   ` Juan Quintela
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 5/9] Add uleb encoding/decoding functions Orit Wasserman
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, Petter Svard, stefanha, mdroth,
	Benoit Hudzia, blauwirbel, Orit Wasserman, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Add LRU page cache mechanism.
The page are accessed by their address.

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 Makefile.objs        |    1 +
 cache.c              |  219 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/qemu/cache.h |   81 ++++++++++++++++++
 qemu-common.h        |   10 +++
 4 files changed, 311 insertions(+), 0 deletions(-)
 create mode 100644 cache.c
 create mode 100644 include/qemu/cache.h

diff --git a/Makefile.objs b/Makefile.objs
index 70c5c79..8fed055 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -137,6 +137,7 @@ common-obj-y += qdev.o qdev-properties.o qdev-monitor.o
 common-obj-y += block-migration.o iohandler.o
 common-obj-y += pflib.o
 common-obj-y += bitmap.o bitops.o
+common-obj-y += cache.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/cache.c b/cache.c
new file mode 100644
index 0000000..cc0870b
--- /dev/null
+++ b/cache.c
@@ -0,0 +1,219 @@
+/*
+ * Page cache for qemu
+ * The cache is base on a hash on the page address
+ *
+ * Copyright 2011 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Orit Wasserman  <owasserm@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <strings.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <stdbool.h>
+#include <glib.h>
+#include <strings.h>
+
+#include "qemu-common.h"
+#include "qemu/cache.h"
+
+#ifdef DEBUG_CACHE
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stdout, "cache: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+typedef struct CacheItem CacheItem;
+
+struct CacheItem {
+    uint64_t it_addr;
+    unsigned long it_age;
+    uint8_t *it_data;
+};
+
+struct Cache {
+    CacheItem *page_cache;
+    unsigned int page_size;
+    int64_t max_num_items;
+    uint64_t max_item_age;
+    int64_t num_items;
+};
+
+Cache *cache_init(int64_t num_pages, unsigned int page_size)
+{
+    int i;
+
+    Cache *cache = g_malloc(sizeof(Cache));
+    if (!cache) {
+        DPRINTF("Error allocation Cache\n");
+        return NULL;
+    }
+
+    if (num_pages <= 0) {
+        DPRINTF("invalid number pages\n");
+        return NULL;
+    }
+
+    /* round down to the nearst power of 2 */
+    if (!is_power_of_2(num_pages)) {
+        num_pages = 1 << ffs(num_pages);
+        DPRINTF("rounding down to %ld\n", num_pages);
+    }
+    cache->page_size = page_size;
+    cache->num_items = 0;
+    cache->max_item_age = 0;
+    cache->max_num_items = num_pages;
+
+    DPRINTF("Setting cache buckets to %lu\n", cache->max_num_items);
+
+    cache->page_cache = g_malloc((cache->max_num_items) *
+                                 sizeof(CacheItem));
+    if (!cache->page_cache) {
+        DPRINTF("could not allocate cache\n");
+        g_free(cache);
+        return NULL;
+    }
+
+    for (i = 0; i < cache->max_num_items; i++) {
+        cache->page_cache[i].it_data = NULL;
+        cache->page_cache[i].it_age = 0;
+        cache->page_cache[i].it_addr = -1;
+    }
+
+    return cache;
+}
+
+void cache_fini(Cache *cache)
+{
+    int i;
+
+    g_assert(cache);
+    g_assert(cache->page_cache);
+
+    for (i = 0; i < cache->max_num_items; i++) {
+        g_free(cache->page_cache[i].it_data);
+        cache->page_cache[i].it_data = 0;
+    }
+
+    g_free(cache->page_cache);
+    cache->page_cache = NULL;
+}
+
+static unsigned long cache_get_cache_pos(const Cache *cache, uint64_t address)
+{
+    unsigned long pos;
+
+    g_assert(cache->max_num_items);
+    pos = (address/cache->page_size) & (cache->max_num_items - 1);
+    return pos;
+}
+
+bool cache_is_cached(const Cache *cache, uint64_t addr)
+{
+    unsigned long pos;
+
+    g_assert(cache);
+    g_assert(cache->page_cache);
+
+    pos = cache_get_cache_pos(cache, addr);
+
+    return (cache->page_cache[pos].it_addr == addr);
+}
+
+static CacheItem *cache_get_by_addr(const Cache *cache, uint64_t addr)
+{
+    unsigned long pos;
+
+    g_assert(cache);
+    g_assert(cache->page_cache);
+
+    pos = cache_get_cache_pos(cache, addr);
+
+    return &cache->page_cache[pos];
+}
+
+uint8_t *get_cached_data(const Cache *cache, uint64_t addr)
+{
+    return cache_get_by_addr(cache, addr)->it_data;
+}
+
+void cache_insert(Cache *cache, unsigned long addr, uint8_t *pdata)
+{
+
+    CacheItem *it = NULL;
+
+    g_assert(cache);
+    g_assert(cache->page_cache);
+
+    /* actual update of entry */
+    it = cache_get_by_addr(cache, addr);
+
+    if (!it->it_data) {
+        cache->num_items++;
+    }
+
+    it->it_data = pdata;
+    it->it_age = ++cache->max_item_age;
+    it->it_addr = addr;
+}
+
+int cache_resize(Cache *cache, int64_t new_num_pages)
+{
+    Cache *new_cache;
+    int i;
+
+    CacheItem *old_it, *new_it;
+
+    g_assert(cache);
+
+    /* same size */
+    if (new_num_pages == cache->max_num_items) {
+        return 0;
+    }
+
+    /* cache was not inited */
+    if (cache->page_cache == NULL) {
+        return -1;
+    }
+
+    new_cache = cache_init(new_num_pages, cache->page_size);
+    if (!(new_cache)) {
+        DPRINTF("Error creating new cache\n");
+        return -1;
+    }
+
+    /* move all data from old cache */
+    for (i = 0; i < cache->max_num_items; i++) {
+        old_it = &cache->page_cache[i];
+        if (old_it->it_addr != -1) {
+            /* check for collision , if there  is keep the first value */
+            new_it = cache_get_by_addr(new_cache, old_it->it_addr);
+            if (new_it->it_data) {
+                g_free(old_it->it_data);
+            } else {
+                cache_insert(new_cache, old_it->it_addr, old_it->it_data);
+            }
+        }
+    }
+
+    cache->page_cache = new_cache->page_cache;
+    cache->max_num_items = new_cache->max_num_items;
+    cache->num_items = new_cache->num_items;
+
+    g_free(new_cache);
+
+    return 0;
+}
diff --git a/include/qemu/cache.h b/include/qemu/cache.h
new file mode 100644
index 0000000..16145e1
--- /dev/null
+++ b/include/qemu/cache.h
@@ -0,0 +1,81 @@
+/*
+ * Page cache for qemu
+ * The cache is base on a hash on the page address
+ *
+ * Copyright 2011 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Orit Wasserman  <owasserm@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#ifndef CACHE_H
+#define CACHE_H
+
+/* Page cache for storing guest pages */
+typedef struct Cache Cache;
+
+/**
+ * cache_init: Initialize the page cache
+ *
+ *
+ * Returns new allocated cache or NULL on error
+ *
+ * @cache pointer to the Cache struct
+ * @num_pages: cache maximal number of cached pages
+ * @page_size: cache page size
+ */
+Cache *cache_init(int64_t num_pages, unsigned int page_size);
+
+/**
+ * cache_fini: free all cache resources
+ * @cache pointer to the Cache struct
+ */
+void cache_fini(Cache *cache);
+
+/**
+ * cache_is_cached: Checks to see if the page is cached
+ *
+ * Returns %true if page is cached
+ *
+ * @cache pointer to the Cache struct
+ * @addr: page addr
+ */
+bool cache_is_cached(const Cache *cache, uint64_t addr);
+
+/**
+ * get_cached_data: Get the data cached for an addr
+ *
+ * Returns pointer to the data cached or NULL if not cached
+ *
+ * @cache pointer to the Cache struct
+ * @addr: page addr
+ */
+uint8_t *get_cached_data(const Cache *cache, uint64_t addr);
+
+/**
+ * cache_insert: insert the page into the cache. the previous value will be overwritten
+ *
+ * @cache pointer to the Cache struct
+ * @addr: page address
+ * @pdata: pointer to the page
+ */
+void cache_insert(Cache *cache, uint64_t addr, uint8_t *pdata);
+
+/**
+ * cache_resize: resize the page cache. In case of size reduction the extra pages
+ * will be freed
+ *
+ * Returns -1 on error
+ *
+ * @cache pointer to the Cache struct
+ * @num_pages: new page cache size (in pages)
+ */
+int cache_resize(Cache *cache, int64_t num_pages);
+
+#endif
diff --git a/qemu-common.h b/qemu-common.h
index 231c012..83571e0 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -1,3 +1,4 @@
+
 /* Common header file that is included by all of qemu.  */
 #ifndef QEMU_COMMON_H
 #define QEMU_COMMON_H
@@ -408,6 +409,15 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 /* Round number up to multiple */
 #define QEMU_ALIGN_UP(n, m) QEMU_ALIGN_DOWN((n) + (m) - 1, (m))
 
+static inline bool is_power_of_2(int64_t value)
+{
+    if (!value) {
+        return 0;
+    }
+
+    return !(value & (value - 1));
+}
+
 #include "module.h"
 
 #endif
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v11 5/9] Add uleb encoding/decoding functions
  2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (3 preceding siblings ...)
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 4/9] Add cache handling functions Orit Wasserman
@ 2012-05-22 12:57 ` Orit Wasserman
  2012-06-01 11:04   ` Juan Quintela
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 6/9] Add save_block_hdr function Orit Wasserman
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, stefanha, mdroth, blauwirbel,
	Orit Wasserman, chegu_vinod, avi, pbonzini, eblake

Implement Unsigned Little Endian Base 128.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 cutils.c      |   29 +++++++++++++++++++++++++++++
 qemu-common.h |    8 ++++++++
 2 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/cutils.c b/cutils.c
index af308cd..60fb7c8 100644
--- a/cutils.c
+++ b/cutils.c
@@ -549,3 +549,32 @@ int qemu_sendv(int sockfd, struct iovec *iov, int len, int iov_offset)
     return do_sendv_recvv(sockfd, iov, len, iov_offset, 1);
 }
 
+/*
+ * Implementation of  ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+ * Input is limited to 14-bit numbers
+ */
+int uleb128_encode_small(uint8_t *out, uint32_t n)
+{
+    g_assert(n <= 0x3fff);
+    if (n < 0x80) {
+        *out++ = n;
+        return 1;
+    } else {
+        *out++ = (n & 0x7f) | 0x80;
+        *out++ = n >> 7;
+        return 2;
+    }
+}
+
+int uleb128_decode_small(const uint8_t *in, uint32_t *n)
+{
+    if (!(*in & 0x80)) {
+        *n = *in++;
+        return 1;
+    } else {
+        *n = *in++ & 0x7f;
+        g_assert(!(*in & 0x80));
+        *n |= *in++ << 7;
+        return 2;
+    }
+}
diff --git a/qemu-common.h b/qemu-common.h
index 83571e0..1162b42 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -420,4 +420,12 @@ static inline bool is_power_of_2(int64_t value)
 
 #include "module.h"
 
+/*
+ * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+ * Input is limited to 14-bit numbers
+ */
+
+int uleb128_encode_small(uint8_t *out, uint32_t n);
+int uleb128_decode_small(const uint8_t *in, uint32_t *n);
+
 #endif
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v11 6/9] Add save_block_hdr function
  2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (4 preceding siblings ...)
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 5/9] Add uleb encoding/decoding functions Orit Wasserman
@ 2012-05-22 12:57 ` Orit Wasserman
  2012-06-01 11:04   ` Juan Quintela
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live Orit Wasserman
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, Petter Svard, stefanha, mdroth,
	Benoit Hudzia, blauwirbel, Orit Wasserman, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 arch_init.c |   26 ++++++++++++++------------
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 988adca..071dc8d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -157,6 +157,18 @@ static int is_dup_page(uint8_t *page)
     return 1;
 }
 
+static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
+        int cont, int flag)
+{
+        qemu_put_be64(f, offset | cont | flag);
+        if (!cont) {
+                qemu_put_byte(f, strlen(block->idstr));
+                qemu_put_buffer(f, (uint8_t *)block->idstr,
+                                strlen(block->idstr));
+        }
+
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -183,21 +195,11 @@ static int ram_save_block(QEMUFile *f)
             p = memory_region_get_ram_ptr(mr) + offset;
 
             if (is_dup_page(p)) {
-                qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
-                if (!cont) {
-                    qemu_put_byte(f, strlen(block->idstr));
-                    qemu_put_buffer(f, (uint8_t *)block->idstr,
-                                    strlen(block->idstr));
-                }
+                save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
                 qemu_put_byte(f, *p);
                 bytes_sent = 1;
             } else {
-                qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
-                if (!cont) {
-                    qemu_put_byte(f, strlen(block->idstr));
-                    qemu_put_buffer(f, (uint8_t *)block->idstr,
-                                    strlen(block->idstr));
-                }
+                save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
                 qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
                 bytes_sent = TARGET_PAGE_SIZE;
             }
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live
  2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (5 preceding siblings ...)
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 6/9] Add save_block_hdr function Orit Wasserman
@ 2012-05-22 12:57 ` Orit Wasserman
  2012-06-01 11:42   ` Juan Quintela
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 8/9] Add set_cachesize command Orit Wasserman
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 9/9] Add XBZRLE statistics Orit Wasserman
  8 siblings, 1 reply; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, Petter Svard, stefanha, mdroth,
	Benoit Hudzia, blauwirbel, Orit Wasserman, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

In the outgoing migration check to see if the page is cached and
changed than send compressed page by using save_xbrle_page function.
In the incoming migration check to see if RAM_SAVE_FLAG_XBRLE is set
and decompress the page (by using load_xbrle function).

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 arch_init.c |  223 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 migration.c |   26 +++++++-
 migration.h |    8 ++
 savevm.c    |   91 ++++++++++++++++++++++++
 4 files changed, 332 insertions(+), 16 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 071dc8d..536d34c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -43,6 +43,15 @@
 #include "hw/smbios.h"
 #include "exec-memory.h"
 #include "hw/pcspk.h"
+#include "qemu/cache.h"
+
+#ifdef DEBUG_ARCH_INIT
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
 
 #ifdef TARGET_SPARC
 int graphic_width = 1024;
@@ -94,6 +103,7 @@ const uint32_t arch_type = QEMU_ARCH;
 #define RAM_SAVE_FLAG_PAGE     0x08
 #define RAM_SAVE_FLAG_EOS      0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
+#define RAM_SAVE_FLAG_XBZRLE   0x40
 
 #ifdef __ALTIVEC__
 #include <altivec.h>
@@ -157,6 +167,22 @@ static int is_dup_page(uint8_t *page)
     return 1;
 }
 
+/* XBZRLE (Xor Based Zero Length Encoding */
+typedef struct XBZRLEHeader {
+    uint32_t xh_cksum;
+    uint16_t xh_len;
+    uint8_t xh_flags;
+} XBZRLEHeader;
+
+/* struct contains XBZRLE cache and a static page
+   used by the compression */
+static struct {
+    /* buffer used for XBZRLE encoding */
+    uint8_t *encoded_buf;
+    /* Cache for XBZRLE */
+    Cache *cache;
+} XBZRLE = {0};
+
 static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
         int cont, int flag)
 {
@@ -169,19 +195,78 @@ static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
 
 }
 
+#define ENCODING_FLAG_XBZRLE 0x1
+
+static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
+                            ram_addr_t current_addr, RAMBlock *block,
+                            ram_addr_t offset, int cont)
+{
+    int encoded_len = 0, bytes_sent = -1, ret = -1;
+    XBZRLEHeader hdr = {0};
+    uint8_t *prev_cached_page;
+
+    /* check to see if page is cached , if not cache and return */
+    if (!cache_is_cached(XBZRLE.cache, current_addr)) {
+        cache_insert(XBZRLE.cache, current_addr, g_memdup(current_data,
+                                                          TARGET_PAGE_SIZE));
+        goto done;
+    }
+
+    prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
+
+    /* XBZRLE encoding (if there is no overflow) */
+    encoded_len = xbzrle_encode_buffer(prev_cached_page, current_data,
+                                       TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
+                                       TARGET_PAGE_SIZE);
+    if (encoded_len == 0) {
+        bytes_sent = 0;
+        DPRINTF("Unmodifed page or overflow skipping\n");
+        goto done;
+    } else if (encoded_len == -1) {
+        bytes_sent = -1;
+        DPRINTF("Overflow\n");
+        /* update data in the cache */
+        memcpy(prev_cached_page, current_data, TARGET_PAGE_SIZE);
+        goto done;
+    }
+
+    /* we need to update the data in the cache, in order to get the same data
+       we cached we decode the encoded page on the cached data */
+    ret = xbzrle_decode_buffer(XBZRLE.encoded_buf, encoded_len,
+                               prev_cached_page, TARGET_PAGE_SIZE);
+    g_assert(ret != -1);
+
+    hdr.xh_len = encoded_len;
+    hdr.xh_flags |= ENCODING_FLAG_XBZRLE;
+
+    /* Send XBZRLE based compressed page */
+    save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_XBZRLE);
+    qemu_put_byte(f, hdr.xh_flags);
+    qemu_put_be16(f, hdr.xh_len);
+    qemu_put_be32(f, hdr.xh_cksum);
+    qemu_put_buffer(f, XBZRLE.encoded_buf, encoded_len);
+    bytes_sent = encoded_len + sizeof(hdr);
+
+done:
+    return bytes_sent;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
-static int ram_save_block(QEMUFile *f)
+static int ram_save_block(QEMUFile *f, int stage)
 {
     RAMBlock *block = last_block;
     ram_addr_t offset = last_offset;
-    int bytes_sent = 0;
+    int bytes_sent = -1;
     MemoryRegion *mr;
+    ram_addr_t current_addr;
 
     if (!block)
         block = QLIST_FIRST(&ram_list.blocks);
 
+    current_addr = block->offset + offset;
+
     do {
         mr = block->mr;
         if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
@@ -198,7 +283,24 @@ static int ram_save_block(QEMUFile *f)
                 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
                 qemu_put_byte(f, *p);
                 bytes_sent = 1;
-            } else {
+            } else if (migrate_use_xbzrle()) {
+                /* in stage 1 none of the pages are cached so we just want to
+                   cache them for next stages, and send the cached copy */
+                if (stage == 1) {
+                    cache_insert(XBZRLE.cache, current_addr,
+                                 g_memdup(p, TARGET_PAGE_SIZE));
+                } else {
+                    bytes_sent = save_xbzrle_page(f, p, current_addr, block,
+                                                  offset, cont);
+                }
+                /* send the cached page copy for stage 1 and 2*/
+                if (stage != 3) {
+                    p = get_cached_data(XBZRLE.cache, current_addr);
+                }
+            }
+
+            /* either we didn't send yet (we may got XBZRLE overflow) */
+            if (bytes_sent == -1) {
                 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
                 qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
                 bytes_sent = TARGET_PAGE_SIZE;
@@ -292,6 +394,17 @@ static void sort_ram_list(void)
     g_free(blocks);
 }
 
+static void migration_end(void)
+{
+    memory_global_dirty_log_stop();
+
+    if (migrate_use_xbzrle()) {
+        cache_fini(XBZRLE.cache);
+        g_free(XBZRLE.cache);
+        XBZRLE.cache = NULL;
+    }
+}
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
     ram_addr_t addr;
@@ -301,7 +414,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     int ret;
 
     if (stage < 0) {
-        memory_global_dirty_log_stop();
+        migration_end();
         return 0;
     }
 
@@ -314,6 +427,17 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
         last_offset = 0;
         sort_ram_list();
 
+        if (migrate_use_xbzrle()) {
+            XBZRLE.cache = cache_init(migrate_xbzrle_cache_size() /
+                                      TARGET_PAGE_SIZE,
+                                      TARGET_PAGE_SIZE);
+            if (!XBZRLE.cache) {
+                DPRINTF("Error creating cache\n");
+                return -1;
+            }
+            XBZRLE.encoded_buf = g_malloc0(TARGET_PAGE_SIZE);
+        }
+
         /* Make sure all dirty bits are set */
         QLIST_FOREACH(block, &ram_list.blocks, next) {
             for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
@@ -341,9 +465,12 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     while ((ret = qemu_file_rate_limit(f)) == 0) {
         int bytes_sent;
 
-        bytes_sent = ram_save_block(f);
-        bytes_transferred += bytes_sent;
-        if (bytes_sent == 0) { /* no more blocks */
+        bytes_sent = ram_save_block(f, stage);
+        /* bytes_sent 0 represent unchanged page,
+           bytes_sent -1 represent no more blocks*/
+        if (bytes_sent > 0) {
+            bytes_transferred += bytes_sent;
+        } else if (bytes_sent == -1) { /* no more blocks */
             break;
         }
     }
@@ -366,19 +493,62 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
         int bytes_sent;
 
         /* flush all remaining blocks regardless of rate limiting */
-        while ((bytes_sent = ram_save_block(f)) != 0) {
+        while ((bytes_sent = ram_save_block(f, stage)) != -1) {
             bytes_transferred += bytes_sent;
         }
-        memory_global_dirty_log_stop();
+        migration_end();
     }
 
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
     expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
 
+    DPRINTF("ram_save_live: expected(%ld) <= max(%ld)?\n", expected_time,
+        migrate_max_downtime());
+
     return (stage == 2) && (expected_time <= migrate_max_downtime());
 }
 
+static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
+{
+    int ret, rc = 0;
+    uint8_t *xbzrle_buf = NULL;
+    XBZRLEHeader hdr = {0};
+
+    /* extract RLE header */
+    hdr.xh_flags = qemu_get_byte(f);
+    hdr.xh_len = qemu_get_be16(f);
+    hdr.xh_cksum = qemu_get_be32(f);
+
+    if (!(hdr.xh_flags & ENCODING_FLAG_XBZRLE)) {
+        fprintf(stderr, "Failed to load XBZRLE page - wrong compression!\n");
+        return -1;
+    }
+
+    if (hdr.xh_len > TARGET_PAGE_SIZE) {
+        fprintf(stderr, "Failed to load XBZRLE page - len overflow!\n");
+        return -1;
+    }
+
+    /* load data and decode */
+    xbzrle_buf = g_malloc0(TARGET_PAGE_SIZE);
+    qemu_get_buffer(f, xbzrle_buf, hdr.xh_len);
+
+    /* decode RLE */
+    ret = xbzrle_decode_buffer(xbzrle_buf, hdr.xh_len, host, TARGET_PAGE_SIZE);
+    if (ret == -1) {
+        fprintf(stderr, "Failed to load XBZRLE page - decode error!\n");
+        rc = -1;
+    } else  if (ret > TARGET_PAGE_SIZE) {
+        fprintf(stderr, "Failed to load XBZRLE page - size %d exceeds %d!\n",
+                ret, TARGET_PAGE_SIZE);
+        rc = -1;
+    }
+
+    g_free(xbzrle_buf);
+    return rc;
+}
+
 static inline void *host_from_stream_offset(QEMUFile *f,
                                             ram_addr_t offset,
                                             int flags)
@@ -412,8 +582,11 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     ram_addr_t addr;
-    int flags;
+    int flags, ret = 0;
     int error;
+    static uint64_t seq_iter;
+
+    seq_iter++;
 
     if (version_id < 4 || version_id > 4) {
         return -EINVAL;
@@ -443,8 +616,10 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 
                     QLIST_FOREACH(block, &ram_list.blocks, next) {
                         if (!strncmp(id, block->idstr, sizeof(id))) {
-                            if (block->length != length)
-                                return -EINVAL;
+                            if (block->length != length) {
+                                ret =  -EINVAL;
+                                goto done;
+                            }
                             break;
                         }
                     }
@@ -452,7 +627,8 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
                     if (!block) {
                         fprintf(stderr, "Unknown ramblock \"%s\", cannot "
                                 "accept migration\n", id);
-                        return -EINVAL;
+                        ret = -EINVAL;
+                        goto done;
                     }
 
                     total_ram_bytes -= length;
@@ -481,16 +657,33 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
             void *host;
 
             host = host_from_stream_offset(f, addr, flags);
+            if (!host) {
+                return -EINVAL;
+            }
 
             qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+        } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
+            void *host = host_from_stream_offset(f, addr, flags);
+            if (!host) {
+                return -EINVAL;
+            }
+
+            if (load_xbzrle(f, addr, host) < 0) {
+                ret = -EINVAL;
+                goto done;
+            }
         }
         error = qemu_file_get_error(f);
         if (error) {
-            return error;
+            ret = error;
+            goto done;
         }
     } while (!(flags & RAM_SAVE_FLAG_EOS));
 
-    return 0;
+done:
+    DPRINTF("Completed load of VM with exit code %d seq iteration %ld\n",
+            ret, seq_iter);
+    return ret;
 }
 
 #ifdef HAS_AUDIO
diff --git a/migration.c b/migration.c
index 952f542..92c39e8 100644
--- a/migration.c
+++ b/migration.c
@@ -43,6 +43,9 @@ enum {
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
+/* Migration XBZRLE cache size */
+#define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
+
 static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 
@@ -55,7 +58,8 @@ static MigrationState *migrate_get_current(void)
     static MigrationState current_migration = {
         .state = MIG_STATE_SETUP,
         .bandwidth_limit = MAX_THROTTLE,
-    };
+        .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE,
+};
 
     return &current_migration;
 }
@@ -410,6 +414,7 @@ static MigrationState *migrate_init(const MigrationParams *params)
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
     bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
+    int64_t xbzrle_cache_size = s->xbzrle_cache_size;
 
     memcpy(enabled_capabilities, s->enabled_capabilities,
            sizeof(enabled_capabilities));
@@ -419,6 +424,7 @@ static MigrationState *migrate_init(const MigrationParams *params)
     s->params = *params;
     memcpy(s->enabled_capabilities, enabled_capabilities,
            sizeof(enabled_capabilities));
+    s->xbzrle_cache_size = xbzrle_cache_size;
 
     s->state = MIG_STATE_SETUP;
 
@@ -516,3 +522,21 @@ void qmp_migrate_set_downtime(double value, Error **errp)
     value = MAX(0, MIN(UINT64_MAX, value));
     max_downtime = (uint64_t)value;
 }
+
+int migrate_use_xbzrle(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE];
+}
+
+int64_t migrate_xbzrle_cache_size(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->xbzrle_cache_size;
+}
diff --git a/migration.h b/migration.h
index 00d1992..eb0b822 100644
--- a/migration.h
+++ b/migration.h
@@ -39,6 +39,7 @@ struct MigrationState
     void *opaque;
     MigrationParams params;
     bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
+    int64_t xbzrle_cache_size;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -99,4 +100,11 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
+                         uint8_t *dst, int dlen);
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
+
+int migrate_use_xbzrle(void);
+int64_t migrate_xbzrle_cache_size(void);
+
 #endif
diff --git a/savevm.c b/savevm.c
index 42937a0..31db838 100644
--- a/savevm.c
+++ b/savevm.c
@@ -2374,3 +2374,94 @@ void vmstate_register_ram_global(MemoryRegion *mr)
 {
     vmstate_register_ram(mr, NULL);
 }
+
+/*
+  page = zrun nzrun
+       | zrun nzrun page
+
+  zrun = length
+
+  nzrun = length byte...
+
+  length = uleb128 encoded integer
+ */
+int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
+                         uint8_t *dst, int dlen)
+{
+    uint32_t zrun_len = 0, nzrun_len = 0;
+    int d = 0 , i = 0;
+    uint8_t *nzrun_start = NULL;
+
+    while (i < slen) {
+        /* overflow */
+        if (d + 2 > dlen) {
+            return -1;
+        }
+
+        while (!(old_buf[i] ^ new_buf[i]) && ++i <= slen) {
+            zrun_len++;
+        }
+
+        /* buffer unchanged */
+        if (zrun_len == slen) {
+            return 0;
+        }
+
+        /* skip last zero run */
+        if (i == slen + 1) {
+            return d;
+        }
+
+        d += uleb128_encode_small(dst + d, zrun_len);
+
+        zrun_len = 0;
+        nzrun_start = new_buf + i;
+        while ((old_buf[i] ^ new_buf[i]) != 0 && ++i <= slen) {
+            nzrun_len++;
+        }
+
+        /* overflow */
+        if (d + nzrun_len + 2 > dlen) {
+            return -1;
+        }
+
+        d += uleb128_encode_small(dst + d, nzrun_len);
+        memcpy(dst + d, nzrun_start, nzrun_len);
+        d += nzrun_len;
+        nzrun_len = 0;
+    }
+
+    return d;
+}
+
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen)
+{
+    int i = 0, d = 0;
+    uint32_t count = 0;
+
+    while (i < slen) {
+
+        /* zrun */
+        i += uleb128_decode_small(src + i, &count);
+        d += count;
+
+        /* overflow */
+        g_assert(d <= dlen);
+
+        /* completed decoding */
+        if (i == slen - 1) {
+            return d;
+        }
+
+        /* nzrun */
+        i += uleb128_decode_small(src + i, &count);
+
+        g_assert(d + count <= dlen);
+
+        memcpy(dst + d , src + i, count);
+        d += count;
+        i += count;
+    }
+
+    return d;
+}
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v11 8/9] Add set_cachesize command
  2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (6 preceding siblings ...)
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live Orit Wasserman
@ 2012-05-22 12:57 ` Orit Wasserman
  2012-06-01 11:19   ` Juan Quintela
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 9/9] Add XBZRLE statistics Orit Wasserman
  8 siblings, 1 reply; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, Petter Svard, stefanha, mdroth,
	Benoit Hudzia, blauwirbel, Orit Wasserman, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Change XBZRLE cache size in bytes (the size should be a power of 2).
If XBZRLE cache size is too small there will be many cache miss.

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 arch_init.c      |    9 +++++++++
 hmp-commands.hx  |   18 ++++++++++++++++++
 hmp.c            |   13 +++++++++++++
 hmp.h            |    1 +
 migration.c      |   25 ++++++++++++++++++++++++-
 migration.h      |    2 ++
 qapi-schema.json |   16 ++++++++++++++++
 qmp-commands.hx  |   23 +++++++++++++++++++++++
 8 files changed, 106 insertions(+), 1 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 536d34c..cdcd24b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -24,6 +24,7 @@
 #include <stdint.h>
 #include <stdarg.h>
 #include <stdlib.h>
+#include <math.h>
 #ifndef _WIN32
 #include <sys/types.h>
 #include <sys/mman.h>
@@ -183,6 +184,14 @@ static struct {
     Cache *cache;
 } XBZRLE = {0};
 
+
+void xbzrle_cache_resize(int64_t new_size)
+{
+    if (XBZRLE.cache != NULL) {
+        cache_resize(XBZRLE.cache, new_size/TARGET_PAGE_SIZE);
+    }
+}
+
 static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
         int cont, int flag)
 {
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e14e7be..7e1a215 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -829,6 +829,24 @@ STEXI
 @item migrate_cancel
 @findex migrate_cancel
 Cancel the current VM migration.
+
+ETEXI
+
+    {
+        .name       = "migrate_set_cachesize",
+        .args_type  = "value:o",
+        .params     = "value",
+        .help       = "set cache size (in bytes) for XBZRLE migrations,"
+		      "the cache size will be round down to the nearest power of 2.\n"
+		      "The cache size effects the number of cache misses."
+		      "In case of a high cache miss ratio you need to increase the cache size",
+        .mhandler.cmd = hmp_migrate_set_cachesize,
+    },
+
+STEXI
+@item migrate_set_cachesize @var{value}
+@findex migrate_set_cache
+Set cache size to @var{value} (in bytes) for xbzrle migrations.
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index 9582400..e3bc3ca 100644
--- a/hmp.c
+++ b/hmp.c
@@ -751,6 +751,19 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict)
     qmp_migrate_set_downtime(value, NULL);
 }
 
+void hmp_migrate_set_cachesize(Monitor *mon, const QDict *qdict)
+{
+    int64_t value = qdict_get_int(qdict, "value");
+    Error *err = NULL;
+
+    qmp_migrate_set_cachesize(value, &err);
+    if (err) {
+        monitor_printf(mon, "%s\n", error_get_pretty(err));
+        error_free(err);
+        return;
+    }
+}
+
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict)
 {
     int64_t value = qdict_get_int(qdict, "value");
diff --git a/hmp.h b/hmp.h
index 5f9d842..9559559 100644
--- a/hmp.h
+++ b/hmp.h
@@ -53,6 +53,7 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
+void hmp_migrate_set_cachesize(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/migration.c b/migration.c
index 92c39e8..07fe335 100644
--- a/migration.c
+++ b/migration.c
@@ -22,6 +22,7 @@
 #include "qemu_socket.h"
 #include "block-migration.h"
 #include "qmp-commands.h"
+#include <math.h>
 
 //#define DEBUG_MIGRATION
 
@@ -43,7 +44,7 @@ enum {
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
-/* Migration XBZRLE cache size */
+/* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
 
 static NotifierList migration_state_notifiers =
@@ -503,6 +504,28 @@ void qmp_migrate_cancel(Error **errp)
     migrate_fd_cancel(migrate_get_current());
 }
 
+void qmp_migrate_set_cachesize(int64_t value, Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+
+    /* Check for truncation */
+    if (value != (size_t)value) {
+        error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
+                  "exceeding address space");
+        return;
+    }
+
+    value = MIN(UINT64_MAX, value);
+
+    /* no change */
+    if (value == s->xbzrle_cache_size) {
+        return;
+    }
+
+    s->xbzrle_cache_size = value;
+    xbzrle_cache_resize(value);
+}
+
 void qmp_migrate_set_speed(int64_t value, Error **errp)
 {
     MigrationState *s;
diff --git a/migration.h b/migration.h
index eb0b822..cc7b433 100644
--- a/migration.h
+++ b/migration.h
@@ -107,4 +107,6 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
 
+void xbzrle_cache_resize(int64_t new_size);
+
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 2887c51..8816f01 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1357,6 +1357,22 @@
 { 'command': 'migrate_set_speed', 'data': {'value': 'int'} }
 
 ##
+# @migrate_set_cachesize
+#
+# Set XBZRLE cache size
+#
+# @value: cache size in bytes
+#
+# The size will be round down to the nearest power of 2.
+# The cache size can be modified before and during ongoing migration
+#
+# Returns: nothing on success
+#
+# Since: 1.2
+##
+{ 'command': 'migrate_set_cachesize', 'data': {'value': 'int'} }
+
+##
 # @ObjectPropertyInfo:
 #
 # @name: the name of the property
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7750f2f..dbc14ad 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -520,6 +520,29 @@ Example:
 <- { "return": {} }
 
 EQMP
+{
+        .name       = "migrate_set_cachesize",
+        .args_type  = "value:o",
+        .mhandler.cmd_new = qmp_marshal_input_migrate_set_cachesize,
+    },
+
+SQMP
+migrate_set_cachesize
+---------------------
+
+Set cache size to be used by XBZRLE migration, the cache size will be round down
+to the nearset power of 2
+
+Arguments:
+
+- "value": cache size in bytes (json-int)
+
+Example:
+
+-> { "execute": "migrate_set_cachesize", "arguments": { "value": 512M } }
+<- { "return": {} }
+
+EQMP
 
     {
         .name       = "migrate_set_speed",
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v11 9/9] Add XBZRLE statistics
  2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (7 preceding siblings ...)
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 8/9] Add set_cachesize command Orit Wasserman
@ 2012-05-22 12:57 ` Orit Wasserman
  2012-06-01 11:10   ` Juan Quintela
  8 siblings, 1 reply; 26+ messages in thread
From: Orit Wasserman @ 2012-05-22 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, aliguori, quintela, Petter Svard, stefanha, mdroth,
	Benoit Hudzia, blauwirbel, Orit Wasserman, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 arch_init.c      |   68 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 hmp.c            |   13 ++++++++++
 migration.c      |   12 +++++++++
 migration.h      |    9 +++++++
 qapi-schema.json |   27 +++++++++++++++++++--
 qmp-commands.hx  |   28 ++++++++++++++++++++++
 6 files changed, 153 insertions(+), 4 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index cdcd24b..588e15e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -192,8 +192,66 @@ void xbzrle_cache_resize(int64_t new_size)
     }
 }
 
+/* accounting */
+typedef struct AccountingInfo {
+    uint64_t dup_pages;
+    uint64_t norm_pages;
+    uint64_t xbzrle_bytes;
+    uint64_t xbzrle_pages;
+    uint64_t xbzrle_cache_miss;
+    uint64_t iterations;
+    uint64_t xbzrle_overflows;
+} AccountingInfo;
+
+static AccountingInfo acct_info;
+
+static void acct_clear(void)
+{
+    memset(&acct_info, 0, sizeof(acct_info));
+}
+
+uint64_t dup_mig_bytes_transferred(void)
+{
+    return acct_info.dup_pages * TARGET_PAGE_SIZE;
+}
+
+uint64_t dup_mig_pages_transferred(void)
+{
+    return acct_info.dup_pages;
+}
+
+uint64_t norm_mig_bytes_transferred(void)
+{
+    return acct_info.norm_pages * TARGET_PAGE_SIZE;
+}
+
+uint64_t norm_mig_pages_transferred(void)
+{
+    return acct_info.norm_pages;
+}
+
+uint64_t xbzrle_mig_bytes_transferred(void)
+{
+    return acct_info.xbzrle_bytes;
+}
+
+uint64_t xbzrle_mig_pages_transferred(void)
+{
+    return acct_info.xbzrle_pages;
+}
+
+uint64_t xbzrle_mig_pages_cache_miss(void)
+{
+    return acct_info.xbzrle_cache_miss;
+}
+
+uint64_t xbzrle_mig_pages_overflow(void)
+{
+    return acct_info.xbzrle_overflows;
+}
+
 static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
-        int cont, int flag)
+                           int cont, int flag)
 {
         qemu_put_be64(f, offset | cont | flag);
         if (!cont) {
@@ -218,6 +276,7 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
     if (!cache_is_cached(XBZRLE.cache, current_addr)) {
         cache_insert(XBZRLE.cache, current_addr, g_memdup(current_data,
                                                           TARGET_PAGE_SIZE));
+        acct_info.xbzrle_cache_miss++;
         goto done;
     }
 
@@ -234,6 +293,7 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
     } else if (encoded_len == -1) {
         bytes_sent = -1;
         DPRINTF("Overflow\n");
+        acct_info.xbzrle_overflows++;
         /* update data in the cache */
         memcpy(prev_cached_page, current_data, TARGET_PAGE_SIZE);
         goto done;
@@ -254,7 +314,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
     qemu_put_be16(f, hdr.xh_len);
     qemu_put_be32(f, hdr.xh_cksum);
     qemu_put_buffer(f, XBZRLE.encoded_buf, encoded_len);
+    acct_info.xbzrle_pages++;
     bytes_sent = encoded_len + sizeof(hdr);
+    acct_info.xbzrle_bytes += bytes_sent;
 
 done:
     return bytes_sent;
@@ -289,6 +351,7 @@ static int ram_save_block(QEMUFile *f, int stage)
             p = memory_region_get_ram_ptr(mr) + offset;
 
             if (is_dup_page(p)) {
+                acct_info.dup_pages++;
                 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
                 qemu_put_byte(f, *p);
                 bytes_sent = 1;
@@ -313,6 +376,7 @@ static int ram_save_block(QEMUFile *f, int stage)
                 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
                 qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
                 bytes_sent = TARGET_PAGE_SIZE;
+                acct_info.norm_pages++;
             }
 
             break;
@@ -445,6 +509,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
                 return -1;
             }
             XBZRLE.encoded_buf = g_malloc0(TARGET_PAGE_SIZE);
+            acct_clear();
         }
 
         /* Make sure all dirty bits are set */
@@ -479,6 +544,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
            bytes_sent -1 represent no more blocks*/
         if (bytes_sent > 0) {
             bytes_transferred += bytes_sent;
+            acct_info.iterations++;
         } else if (bytes_sent == -1) { /* no more blocks */
             break;
         }
diff --git a/hmp.c b/hmp.c
index e3bc3ca..a718704 100644
--- a/hmp.c
+++ b/hmp.c
@@ -162,6 +162,19 @@ void hmp_info_migrate(Monitor *mon)
                        info->disk->total >> 10);
     }
 
+    if (info->has_cache) {
+        monitor_printf(mon, "cache size: %" PRIu64 " bytes\n",
+                       info->cache->cache_size);
+        monitor_printf(mon, "xbzrle transferred: %" PRIu64 " kbytes\n",
+                       info->cache->xbzrle_bytes >> 10);
+        monitor_printf(mon, "xbzrle pages: %" PRIu64 " pages\n",
+                       info->cache->xbzrle_pages);
+        monitor_printf(mon, "xbzrle cache miss: %" PRIu64 "\n",
+                       info->cache->xbzrle_cache_miss);
+        monitor_printf(mon, "xbzrle overflow : %" PRIu64 "\n",
+                       info->cache->xbzrle_overflow);
+    }
+
     qapi_free_MigrationInfo(info);
 }
 
diff --git a/migration.c b/migration.c
index 07fe335..84aff01 100644
--- a/migration.c
+++ b/migration.c
@@ -148,6 +148,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
         info->ram->transferred = ram_bytes_transferred();
         info->ram->remaining = ram_bytes_remaining();
         info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->norm  = norm_mig_pages_transferred();
 
         if (blk_mig_active()) {
             info->has_disk = true;
@@ -156,6 +158,16 @@ MigrationInfo *qmp_query_migrate(Error **errp)
             info->disk->remaining = blk_mig_bytes_remaining();
             info->disk->total = blk_mig_bytes_total();
         }
+
+        if (migrate_use_xbzrle()) {
+            info->has_cache = true;
+            info->cache = g_malloc0(sizeof(*info->cache));
+            info->cache->cache_size = migrate_xbzrle_cache_size();
+            info->cache->xbzrle_bytes  = xbzrle_mig_bytes_transferred();
+            info->cache->xbzrle_pages  = xbzrle_mig_pages_transferred();
+            info->cache->xbzrle_cache_miss = xbzrle_mig_pages_cache_miss();
+            info->cache->xbzrle_overflow = xbzrle_mig_pages_overflow();
+        }
         break;
     case MIG_STATE_COMPLETED:
         info->has_status = true;
diff --git a/migration.h b/migration.h
index cc7b433..7443fe5 100644
--- a/migration.h
+++ b/migration.h
@@ -83,6 +83,15 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 
+uint64_t dup_mig_bytes_transferred(void);
+uint64_t dup_mig_pages_transferred(void);
+uint64_t norm_mig_bytes_transferred(void);
+uint64_t norm_mig_pages_transferred(void);
+uint64_t xbzrle_mig_bytes_transferred(void);
+uint64_t xbzrle_mig_pages_transferred(void);
+uint64_t xbzrle_mig_pages_overflow(void);
+uint64_t xbzrle_mig_pages_cache_miss(void);
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 8816f01..82177d6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -241,7 +241,28 @@
 # Since: 0.14.0.
 ##
 { 'type': 'MigrationStats',
-  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' } }
+  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int', 'duplicate': 'int', 'norm': 'int' } }
+
+##
+# @CacheStats
+#
+# Detailed XBZRLE migration cache statistics
+#
+# @cache_size: XBZRLE cache size
+#
+# @xbzrle_bytes: amount of bytes already transferred to the target VM
+#
+# @xbzrle_pages: amount of pages transferred to the target VM
+#
+# @xbzrle_cache_miss: numer of cache miss
+#
+# @xbzrle_overflow: number of overflows
+#
+# Since: 1.1
+##
+{ 'type': 'CacheStats',
+  'data': {'cache_size': 'int', 'xbzrle_bytes': 'int', 'xbzrle_pages': 'int',
+           'xbzrle_cache_miss': 'int', 'xbzrle_overflow': 'int' } }
 
 ##
 # @MigrationInfo
@@ -264,8 +285,8 @@
 ##
 { 'type': 'MigrationInfo',
   'data': {'*status': 'str', '*ram': 'MigrationStats',
-           '*disk': 'MigrationStats', '*params': ['MigrationCapabilityInfo']} }
-
+           '*disk': 'MigrationStats', '*params': ['MigrationCapabilityInfo'],
+           '*cache': 'CacheStats'} }
 ##
 # @query-migrate
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index dbc14ad..0aff940 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2031,6 +2031,13 @@ The main json-object contains the following:
          - "transferred": amount transferred (json-int)
          - "remaining": amount remaining (json-int)
          - "total": total (json-int)
+- "cache": only present if "status" and XBZRLE is active.
+  it is a json-object with the following XBZRLE information:
+         - "cache size": XBZRLE cache size
+     	 - "xbzrle_bytes": total XBZRLE bytes transferred
+	 - "xbzrle_pages": number of XBZRLE compressed pages
+	 - "cache_miss": number of cache misses
+	 - "overflow": number of XBZRLE overflows
 
 Examples:
 
@@ -2082,6 +2089,27 @@ Examples:
       }
    }
 
+5. Migration is being performed and XBZRLE is active:
+
+-> { "execute": "query-migrate" }
+<- {
+      "return":{
+         "status":"active",
+         "ram":{
+            "total":1057024,
+            "remaining":1053304,
+            "transferred":3720
+         },
+         "cache":{
+	    "size": 1024
+            "xbzrle_transferred":20971520,
+	    "xbzrle_pages":2444343,
+	    "xbzrle_cache_miss:2244,
+	    "xbzrle_overflow":34434
+         }
+      }
+   }
+
 EQMP
 
     {
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 2/9] Add migration capabilites
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 2/9] Add migration capabilites Orit Wasserman
@ 2012-05-22 13:08   ` Eric Blake
  2012-06-01 10:57   ` Juan Quintela
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Blake @ 2012-05-22 13:08 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, quintela, stefanha, qemu-devel, mdroth,
	blauwirbel, avi, pbonzini, chegu_vinod

[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]

On 05/22/2012 06:56 AM, Orit Wasserman wrote:
> Add migration capabiltes that can be queried by the management.
> The managment can query the source QEMU and the destination QEMU in order to
> verify both support some  migration capability (currently only XBZRLE).
> The managment can enable a capabilty for the next migration by using
> migrate_set_parameter command.
> 

>  ##
> +# @MigrationCapability
> +#
> +# Migration capabilities enumaration
> +#
> +# @xbzrle: current migration supports xbzrle
> +#
> +# Since: 1.1

1.2

> +##
> +{ 'enum': 'MigrationCapability',
> +  'data': ['xbzrle'] }
> +

> +
> +##
> +# @migrate_set_parameter
> +#
> +# Set the following migration parameters (like xbzrle )
> +##
> +# Since: 1.2
> +##
> +{ 'command': 'migrate-set-parameter', 'data': { 'parameter': 'str' } }

This requires libvirt to issue multiple monitor commands in a row.
Also, it's pretty poorly typed.  It might be nicer to provide:

{ 'command': 'migrate-set-parameters',
  'data': { 'parameters': ['MigrationCapability'] } }

so that I can set multiple parameters in one call.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 3/9] Add XBZRLE documentation
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 3/9] Add XBZRLE documentation Orit Wasserman
@ 2012-05-22 13:13   ` Eric Blake
  2012-06-01 10:58   ` Juan Quintela
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Blake @ 2012-05-22 13:13 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, quintela, stefanha, qemu-devel, mdroth,
	blauwirbel, avi, pbonzini, chegu_vinod

[-- Attachment #1: Type: text/plain, Size: 1711 bytes --]

On 05/22/2012 06:56 AM, Orit Wasserman wrote:
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  docs/xbzrle.txt |  114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 114 insertions(+), 0 deletions(-)
>  create mode 100644 docs/xbzrle.txt
> 
> diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
> new file mode 100644
> index 0000000..16950d5
> --- /dev/null
> +++ b/docs/xbzrle.txt
> @@ -0,0 +1,114 @@
> +XBZRLE (Xor Based Zero Run Length Encoding)
> +===========================================
> +
> +Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction of VM
> +downtime and the total live-migration time of Virtual machines.
> +It is particularly useful for virtual machines running memory write intensive
> +workloads that are typical of large enterprise applications such as SAP ERP
> +Systems, and generally speaking for any application that uses a sparse memory
> +update pattern.
> +
> +Instead of sending the changed guest memory page this solution will send a
> +compressed version of the updates, thus reducing the amount of data sent during
> +live migration.
> +In order to be able to calculate the update, the previous memory pages needed to

s/needed/need/

> +be stored. Those pages are stored in a dedicated cache (hash table) and are

s/stored./stored on the source./

> +Format
> +=======
> +
> +The compression format uses the zero value, where zero represents an unchanged
> +value.

s/uses the zero value/performs an XOR between the previous and current
content of the page/

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 1/9] Add MigrationParams structure
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 1/9] Add MigrationParams structure Orit Wasserman
@ 2012-06-01 10:51   ` Juan Quintela
  0 siblings, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2012-06-01 10:51 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, mdroth, blauwirbel,
	Isaku Yamahata, chegu_vinod, avi, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> From: Isaku Yamahata <yamahata@valinux.co.jp>
>
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

Reviewed-by: Juan Quintela <quintela@redhat.com>
> @@ -1570,7 +1571,7 @@ int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
>          if(se->set_params == NULL) {
>              continue;
>  	}
> -	se->set_params(blk_enable, shared, se->opaque);
> +        se->set_params(params, se->opaque);

Extra spaces here.

Later, Juan.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 2/9] Add migration capabilites
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 2/9] Add migration capabilites Orit Wasserman
  2012-05-22 13:08   ` Eric Blake
@ 2012-06-01 10:57   ` Juan Quintela
  2012-06-06  1:48     ` Orit Wasserman
  1 sibling, 1 reply; 26+ messages in thread
From: Juan Quintela @ 2012-06-01 10:57 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, mdroth, blauwirbel,
	chegu_vinod, avi, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> Add migration capabiltes that can be queried by the management.
> The managment can query the source QEMU and the destination QEMU in order to
> verify both support some  migration capability (currently only XBZRLE).
> The managment can enable a capabilty for the next migration by using
> migrate_set_parameter command.
>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> +void qmp_migrate_set_parameter(const char *parameter, Error **errp)
> +{
> +    MigrationState *s = migrate_get_current();
> +    int i;
> +
> +    if (s->state == MIG_STATE_ACTIVE) {
> +        error_set(errp, QERR_MIGRATION_ACTIVE);
> +        return;
> +    }
> +
> +    for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
> +        if (strcmp(parameter, MigrationCapability_lookup[i]) == 0) {
> +            s->enabled_capabilities[i] = true;
> +            return;
> +        }
> +    }
> +
> +    error_set(errp, QERR_INVALID_PARAMETER, parameter);
> +}

Two things here:
- Is there a way to disable capabilities?  it seems no.
- Would we want in the future capabilities that are not "bool"?  Just
  asking loud, I haven't thought a lot about this.  Fixing it as a
  paramenter, it would make trivial to fix previous comment: cap:true vs
  cap:false, or whatever syntax we want.

>      memset(s, 0, sizeof(*s));
>      s->bandwidth_limit = bandwidth_limit;
>      s->params = *params;
> +    memcpy(s->enabled_capabilities, enabled_capabilities,
> +           sizeof(enabled_capabilities));
>  
> -    s->bandwidth_limit = bandwidth_limit;
>      s->state = MIG_STATE_SETUP;

Nice catch/cleanup.


> diff --git a/savevm.c b/savevm.c
> index dd66f2c..42937a0 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1711,7 +1711,7 @@ static int qemu_savevm_state(QEMUFile *f)
>      int ret;
>      MigrationParams params = {
>          .blk = 0,
> -        .shared = 0
> +        .shared = 0,
>      };
>  
>      if (qemu_savevm_state_blocked(NULL)) {

This belongs to previous patch?

Later, Juan.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 3/9] Add XBZRLE documentation
  2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 3/9] Add XBZRLE documentation Orit Wasserman
  2012-05-22 13:13   ` Eric Blake
@ 2012-06-01 10:58   ` Juan Quintela
  1 sibling, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2012-06-01 10:58 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, mdroth, blauwirbel,
	chegu_vinod, avi, pbonzini, eblake


> +This work was originally based on research results published 

checkpatch complains about this space.  Rest of patch looks nice.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 4/9] Add cache handling functions
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 4/9] Add cache handling functions Orit Wasserman
@ 2012-06-01 11:01   ` Juan Quintela
  0 siblings, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2012-06-01 11:01 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, Benoit Hudzia,
	mdroth, blauwirbel, Petter Svard, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> Add LRU page cache mechanism.
> The page are accessed by their address.
>
> Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
> Signed-off-by: Petter Svard <petters@cs.umu.se>
> Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 5/9] Add uleb encoding/decoding functions
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 5/9] Add uleb encoding/decoding functions Orit Wasserman
@ 2012-06-01 11:04   ` Juan Quintela
  0 siblings, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2012-06-01 11:04 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, mdroth, blauwirbel,
	chegu_vinod, avi, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> Implement Unsigned Little Endian Base 128.
>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 6/9] Add save_block_hdr function
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 6/9] Add save_block_hdr function Orit Wasserman
@ 2012-06-01 11:04   ` Juan Quintela
  0 siblings, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2012-06-01 11:04 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, Benoit Hudzia,
	mdroth, blauwirbel, Petter Svard, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
> Signed-off-by: Petter Svard <petters@cs.umu.se>
> Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 9/9] Add XBZRLE statistics
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 9/9] Add XBZRLE statistics Orit Wasserman
@ 2012-06-01 11:10   ` Juan Quintela
  0 siblings, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2012-06-01 11:10 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, Benoit Hudzia,
	mdroth, blauwirbel, Petter Svard, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
> Signed-off-by: Petter Svard <petters@cs.umu.se>
> Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 8/9] Add set_cachesize command
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 8/9] Add set_cachesize command Orit Wasserman
@ 2012-06-01 11:19   ` Juan Quintela
  2012-06-06  2:14     ` Orit Wasserman
  0 siblings, 1 reply; 26+ messages in thread
From: Juan Quintela @ 2012-06-01 11:19 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, Benoit Hudzia,
	mdroth, blauwirbel, Petter Svard, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> Change XBZRLE cache size in bytes (the size should be a power of 2).
> If XBZRLE cache size is too small there will be many cache miss.
>
> Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
> Signed-off-by: Petter Svard <petters@cs.umu.se>
> Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

> +void qmp_migrate_set_cachesize(int64_t value, Error **errp)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    /* Check for truncation */
> +    if (value != (size_t)value) {
> +        error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
> +                  "exceeding address space");
> +        return;
> +    }
> +
> +    value = MIN(UINT64_MAX, value);

This looks fishy to say the least.  value is signed.  Is there any way
that UINT64_MAX is going to be smaller than value?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live
  2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live Orit Wasserman
@ 2012-06-01 11:42   ` Juan Quintela
  2012-06-06  2:13     ` Orit Wasserman
  0 siblings, 1 reply; 26+ messages in thread
From: Juan Quintela @ 2012-06-01 11:42 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, Benoit Hudzia,
	mdroth, blauwirbel, Petter Svard, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> In the outgoing migration check to see if the page is cached and
> changed than send compressed page by using save_xbrle_page function.
> In the incoming migration check to see if RAM_SAVE_FLAG_XBRLE is set
> and decompress the page (by using load_xbrle function).


This patch can be split to easy review.

> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -43,6 +43,15 @@
>  #include "hw/smbios.h"
>  #include "exec-memory.h"
>  #include "hw/pcspk.h"
> +#include "qemu/cache.h"
> +
> +#ifdef DEBUG_ARCH_INIT
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif

Independent of xbzrle.

>  
>  #ifdef TARGET_SPARC
>  int graphic_width = 1024;
> @@ -94,6 +103,7 @@ const uint32_t arch_type = QEMU_ARCH;
>  #define RAM_SAVE_FLAG_PAGE     0x08
>  #define RAM_SAVE_FLAG_EOS      0x10
>  #define RAM_SAVE_FLAG_CONTINUE 0x20
> +#define RAM_SAVE_FLAG_XBZRLE   0x40
>  
>  #ifdef __ALTIVEC__
>  #include <altivec.h>
> @@ -157,6 +167,22 @@ static int is_dup_page(uint8_t *page)
>      return 1;
>  }
>  
> +/* XBZRLE (Xor Based Zero Length Encoding */
> +typedef struct XBZRLEHeader {
> +    uint32_t xh_cksum;

We are still not using this value, and we are sending it anyway (with a
value of zero).  What happens when we start using if for a checksum, and
we migration to a new version that "expects" it to be valid?  I would
preffer not to sent it, or sent the correct value.

> +    uint16_t xh_len;
> +    uint8_t xh_flags;
> +} XBZRLEHeader;
> +
> +/* struct contains XBZRLE cache and a static page
> +   used by the compression */
> +static struct {
> +    /* buffer used for XBZRLE encoding */
> +    uint8_t *encoded_buf;
> +    /* Cache for XBZRLE */
> +    Cache *cache;
> +} XBZRLE = {0};

Use c99 initializers, please.

{ .encoded_buf = NULL,
  .cache = NULL,
}

More instances in other parts.

> +
>  static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
>          int cont, int flag)
 >  {
> @@ -169,19 +195,78 @@ static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
>  
>  }
>  
> +#define ENCODING_FLAG_XBZRLE 0x1
> +
> +static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
> +                            ram_addr_t current_addr, RAMBlock *block,
> +                            ram_addr_t offset, int cont)
> +{
> +    int encoded_len = 0, bytes_sent = -1, ret = -1;
> +    XBZRLEHeader hdr = {0};
> +    uint8_t *prev_cached_page;
> +
> +    /* check to see if page is cached , if not cache and return */
> +    if (!cache_is_cached(XBZRLE.cache, current_addr)) {
> +        cache_insert(XBZRLE.cache, current_addr, g_memdup(current_data,
> +                                                          TARGET_PAGE_SIZE));
> +        goto done;
> +    }
> +
> +    prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
> +
> +    /* XBZRLE encoding (if there is no overflow) */
> +    encoded_len = xbzrle_encode_buffer(prev_cached_page, current_data,
> +                                       TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
> +                                       TARGET_PAGE_SIZE);
> +    if (encoded_len == 0) {
> +        bytes_sent = 0;
> +        DPRINTF("Unmodifed page or overflow skipping\n");
> +        goto done;
> +    } else if (encoded_len == -1) {
> +        bytes_sent = -1;
> +        DPRINTF("Overflow\n");
> +        /* update data in the cache */
> +        memcpy(prev_cached_page, current_data, TARGET_PAGE_SIZE);
> +        goto done;
> +    }
> +
> +    /* we need to update the data in the cache, in order to get the same data
> +       we cached we decode the encoded page on the cached data */
> +    ret = xbzrle_decode_buffer(XBZRLE.encoded_buf, encoded_len,
> +                               prev_cached_page, TARGET_PAGE_SIZE);
> +    g_assert(ret != -1);
> +
> +    hdr.xh_len = encoded_len;
> +    hdr.xh_flags |= ENCODING_FLAG_XBZRLE;
> +
> +    /* Send XBZRLE based compressed page */
> +    save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_XBZRLE);
> +    qemu_put_byte(f, hdr.xh_flags);
> +    qemu_put_be16(f, hdr.xh_len);
> +    qemu_put_be32(f, hdr.xh_cksum);
> +    qemu_put_buffer(f, XBZRLE.encoded_buf, encoded_len);
> +    bytes_sent = encoded_len + sizeof(hdr);
> +
> +done:
> +    return bytes_sent;
> +}
> +
>  static RAMBlock *last_block;
>  static ram_addr_t last_offset;
>  
> -static int ram_save_block(QEMUFile *f)
> +static int ram_save_block(QEMUFile *f, int stage)
>  {
>      RAMBlock *block = last_block;
>      ram_addr_t offset = last_offset;
> -    int bytes_sent = 0;
> +    int bytes_sent = -1;
>      MemoryRegion *mr;
> +    ram_addr_t current_addr;
>  
>      if (!block)
>          block = QLIST_FIRST(&ram_list.blocks);
>  
> +    current_addr = block->offset + offset;
> +
>      do {
>          mr = block->mr;
>          if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
> @@ -198,7 +283,24 @@ static int ram_save_block(QEMUFile *f)
>                  save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
>                  qemu_put_byte(f, *p);
>                  bytes_sent = 1;
> -            } else {
> +            } else if (migrate_use_xbzrle()) {
> +                /* in stage 1 none of the pages are cached so we just want to
> +                   cache them for next stages, and send the cached copy */
> +                if (stage == 1) {
> +                    cache_insert(XBZRLE.cache, current_addr,
> +                                 g_memdup(p, TARGET_PAGE_SIZE));
> +                } else {
> +                    bytes_sent = save_xbzrle_page(f, p, current_addr, block,
> +                                                  offset, cont);
> +                }
> +                /* send the cached page copy for stage 1 and 2*/
> +                if (stage != 3) {
> +                    p = get_cached_data(XBZRLE.cache, current_addr);
> +                }
> +            }
> +
> +            /* either we didn't send yet (we may got XBZRLE overflow) */
> +            if (bytes_sent == -1) {
>                  save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
>                  qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
>                  bytes_sent = TARGET_PAGE_SIZE;


I think that code is not right when save_xbzrle_page() returns 0.  That
means that page hasn't changed since last time we sent that page.  We
shouldn't break in that case.  Just continue with next page, right?

On the other hand ... Why are we doing the stage == 1 test?  stage 1
normally only sent part of the pages, so we could use the generic code
there?  It would just return -1 as bytes_sent, and do the same code that
we have now?

The optimization for stage 3 is not done backwards?  We are inserting
the page in the cache even if we are on stage 3.  In stage three we
should:
- look if page is on the cache: do usual xbrlze trick
- if it is not, just sent the whole page without inserting it into the
cache?  We are never going to reuse it, so putting it into the cache
would not help us at all.  We are just making an extra copy?


>  
>      qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>  
>      expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
>  
> +    DPRINTF("ram_save_live: expected(%ld) <= max(%ld)?\n", expected_time,
> +        migrate_max_downtime());
> +

This belongs to debugging patch.

> +    /* load data and decode */
> +    xbzrle_buf = g_malloc0(TARGET_PAGE_SIZE);

can't we have a static buffer of that size, and avoid all the
malloc/free business?  If space is tight, we can allways put it on the
xbrle structure and assign it only for migration.

> @@ -481,16 +657,33 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
>              void *host;
>  
>              host = host_from_stream_offset(f, addr, flags);
> +            if (!host) {
> +                return -EINVAL;
> +            }

Why is this check only needed now?

Later, Juan.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 2/9] Add migration capabilites
  2012-06-01 10:57   ` Juan Quintela
@ 2012-06-06  1:48     ` Orit Wasserman
  2012-06-07 10:41       ` Juan Quintela
  0 siblings, 1 reply; 26+ messages in thread
From: Orit Wasserman @ 2012-06-06  1:48 UTC (permalink / raw)
  To: quintela
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, mdroth, blauwirbel,
	chegu_vinod, avi, pbonzini, eblake

On 06/01/2012 01:57 PM, Juan Quintela wrote:
> Orit Wasserman <owasserm@redhat.com> wrote:
>> Add migration capabiltes that can be queried by the management.
>> The managment can query the source QEMU and the destination QEMU in order to
>> verify both support some  migration capability (currently only XBZRLE).
>> The managment can enable a capabilty for the next migration by using
>> migrate_set_parameter command.
>>
>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
>> +void qmp_migrate_set_parameter(const char *parameter, Error **errp)
>> +{
>> +    MigrationState *s = migrate_get_current();
>> +    int i;
>> +
>> +    if (s->state == MIG_STATE_ACTIVE) {
>> +        error_set(errp, QERR_MIGRATION_ACTIVE);
>> +        return;
>> +    }
>> +
>> +    for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
>> +        if (strcmp(parameter, MigrationCapability_lookup[i]) == 0) {
>> +            s->enabled_capabilities[i] = true;
>> +            return;
>> +        }
>> +    }
>> +
>> +    error_set(errp, QERR_INVALID_PARAMETER, parameter);
>> +}
> 
> Two things here:
> - Is there a way to disable capabilities?  it seems no.

In this implementation we can't disable a capability , do you see a need to add it ?

> - Would we want in the future capabilities that are not "bool"?  Just
>   asking loud, I haven't thought a lot about this.  Fixing it as a
>   paramenter, it would make trivial to fix previous comment: cap:true vs
>   cap:false, or whatever syntax we want.
That is a good idea I will change it in next patch set.

Orit
> 
>>      memset(s, 0, sizeof(*s));
>>      s->bandwidth_limit = bandwidth_limit;
>>      s->params = *params;
>> +    memcpy(s->enabled_capabilities, enabled_capabilities,
>> +           sizeof(enabled_capabilities));
>>  
>> -    s->bandwidth_limit = bandwidth_limit;
>>      s->state = MIG_STATE_SETUP;
> 
> Nice catch/cleanup.
> 
> 
>> diff --git a/savevm.c b/savevm.c
>> index dd66f2c..42937a0 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -1711,7 +1711,7 @@ static int qemu_savevm_state(QEMUFile *f)
>>      int ret;
>>      MigrationParams params = {
>>          .blk = 0,
>> -        .shared = 0
>> +        .shared = 0,
>>      };
>>  
>>      if (qemu_savevm_state_blocked(NULL)) {
> 
> This belongs to previous patch?
> 
> Later, Juan.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live
  2012-06-01 11:42   ` Juan Quintela
@ 2012-06-06  2:13     ` Orit Wasserman
  2012-06-07 10:38       ` Juan Quintela
  0 siblings, 1 reply; 26+ messages in thread
From: Orit Wasserman @ 2012-06-06  2:13 UTC (permalink / raw)
  To: quintela
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, Benoit Hudzia,
	mdroth, blauwirbel, Petter Svard, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

On 06/01/2012 02:42 PM, Juan Quintela wrote:
> Orit Wasserman <owasserm@redhat.com> wrote:
>> In the outgoing migration check to see if the page is cached and
>> changed than send compressed page by using save_xbrle_page function.
>> In the incoming migration check to see if RAM_SAVE_FLAG_XBRLE is set
>> and decompress the page (by using load_xbrle function).
> 
> 
> This patch can be split to easy review.
Sure.
> 
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -43,6 +43,15 @@
>>  #include "hw/smbios.h"
>>  #include "exec-memory.h"
>>  #include "hw/pcspk.h"
>> +#include "qemu/cache.h"
>> +
>> +#ifdef DEBUG_ARCH_INIT
>> +#define DPRINTF(fmt, ...) \
>> +    do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
>> +#else
>> +#define DPRINTF(fmt, ...) \
>> +    do { } while (0)
>> +#endif
> 
> Independent of xbzrle.
> 
>>  
>>  #ifdef TARGET_SPARC
>>  int graphic_width = 1024;
>> @@ -94,6 +103,7 @@ const uint32_t arch_type = QEMU_ARCH;
>>  #define RAM_SAVE_FLAG_PAGE     0x08
>>  #define RAM_SAVE_FLAG_EOS      0x10
>>  #define RAM_SAVE_FLAG_CONTINUE 0x20
>> +#define RAM_SAVE_FLAG_XBZRLE   0x40
>>  
>>  #ifdef __ALTIVEC__
>>  #include <altivec.h>
>> @@ -157,6 +167,22 @@ static int is_dup_page(uint8_t *page)
>>      return 1;
>>  }
>>  
>> +/* XBZRLE (Xor Based Zero Length Encoding */
>> +typedef struct XBZRLEHeader {
>> +    uint32_t xh_cksum;
> 
> We are still not using this value, and we are sending it anyway (with a
> value of zero).  What happens when we start using if for a checksum, and
> we migration to a new version that "expects" it to be valid?  I would
> preffer not to sent it, or sent the correct value.
I think I will remove it, checksum should be used for all migration not just XBZRLE.
I guess we can add it to the protocol in the future.
> 
>> +    uint16_t xh_len;
>> +    uint8_t xh_flags;
>> +} XBZRLEHeader;
>> +
>> +/* struct contains XBZRLE cache and a static page
>> +   used by the compression */
>> +static struct {
>> +    /* buffer used for XBZRLE encoding */
>> +    uint8_t *encoded_buf;
>> +    /* Cache for XBZRLE */
>> +    Cache *cache;
>> +} XBZRLE = {0};
> 
> Use c99 initializers, please.
> 
> { .encoded_buf = NULL,
>   .cache = NULL,
> }
> 
> More instances in other parts.
> 
>> +
>>  static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
>>          int cont, int flag)
>  >  {
>> @@ -169,19 +195,78 @@ static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
>>  
>>  }
>>  
>> +#define ENCODING_FLAG_XBZRLE 0x1
>> +
>> +static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
>> +                            ram_addr_t current_addr, RAMBlock *block,
>> +                            ram_addr_t offset, int cont)
>> +{
>> +    int encoded_len = 0, bytes_sent = -1, ret = -1;
>> +    XBZRLEHeader hdr = {0};
>> +    uint8_t *prev_cached_page;
>> +
>> +    /* check to see if page is cached , if not cache and return */
>> +    if (!cache_is_cached(XBZRLE.cache, current_addr)) {
>> +        cache_insert(XBZRLE.cache, current_addr, g_memdup(current_data,
>> +                                                          TARGET_PAGE_SIZE));
>> +        goto done;
>> +    }
>> +
>> +    prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
>> +
>> +    /* XBZRLE encoding (if there is no overflow) */
>> +    encoded_len = xbzrle_encode_buffer(prev_cached_page, current_data,
>> +                                       TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
>> +                                       TARGET_PAGE_SIZE);
>> +    if (encoded_len == 0) {
>> +        bytes_sent = 0;
>> +        DPRINTF("Unmodifed page or overflow skipping\n");
>> +        goto done;
>> +    } else if (encoded_len == -1) {
>> +        bytes_sent = -1;
>> +        DPRINTF("Overflow\n");
>> +        /* update data in the cache */
>> +        memcpy(prev_cached_page, current_data, TARGET_PAGE_SIZE);
>> +        goto done;
>> +    }
>> +
>> +    /* we need to update the data in the cache, in order to get the same data
>> +       we cached we decode the encoded page on the cached data */
>> +    ret = xbzrle_decode_buffer(XBZRLE.encoded_buf, encoded_len,
>> +                               prev_cached_page, TARGET_PAGE_SIZE);
>> +    g_assert(ret != -1);
>> +
>> +    hdr.xh_len = encoded_len;
>> +    hdr.xh_flags |= ENCODING_FLAG_XBZRLE;
>> +
>> +    /* Send XBZRLE based compressed page */
>> +    save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_XBZRLE);
>> +    qemu_put_byte(f, hdr.xh_flags);
>> +    qemu_put_be16(f, hdr.xh_len);
>> +    qemu_put_be32(f, hdr.xh_cksum);
>> +    qemu_put_buffer(f, XBZRLE.encoded_buf, encoded_len);
>> +    bytes_sent = encoded_len + sizeof(hdr);
>> +
>> +done:
>> +    return bytes_sent;
>> +}
>> +
>>  static RAMBlock *last_block;
>>  static ram_addr_t last_offset;
>>  
>> -static int ram_save_block(QEMUFile *f)
>> +static int ram_save_block(QEMUFile *f, int stage)
>>  {
>>      RAMBlock *block = last_block;
>>      ram_addr_t offset = last_offset;
>> -    int bytes_sent = 0;
>> +    int bytes_sent = -1;
>>      MemoryRegion *mr;
>> +    ram_addr_t current_addr;
>>  
>>      if (!block)
>>          block = QLIST_FIRST(&ram_list.blocks);
>>  
>> +    current_addr = block->offset + offset;
>> +
>>      do {
>>          mr = block->mr;
>>          if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
>> @@ -198,7 +283,24 @@ static int ram_save_block(QEMUFile *f)
>>                  save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
>>                  qemu_put_byte(f, *p);
>>                  bytes_sent = 1;
>> -            } else {
>> +            } else if (migrate_use_xbzrle()) {
>> +                /* in stage 1 none of the pages are cached so we just want to
>> +                   cache them for next stages, and send the cached copy */
>> +                if (stage == 1) {
>> +                    cache_insert(XBZRLE.cache, current_addr,
>> +                                 g_memdup(p, TARGET_PAGE_SIZE));
>> +                } else {
>> +                    bytes_sent = save_xbzrle_page(f, p, current_addr, block,
>> +                                                  offset, cont);
>> +                }
>> +                /* send the cached page copy for stage 1 and 2*/
>> +                if (stage != 3) {
>> +                    p = get_cached_data(XBZRLE.cache, current_addr);
>> +                }
>> +            }
>> +
>> +            /* either we didn't send yet (we may got XBZRLE overflow) */
>> +            if (bytes_sent == -1) {
>>                  save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
>>                  qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
>>                  bytes_sent = TARGET_PAGE_SIZE;
> 
> 
> I think that code is not right when save_xbzrle_page() returns 0.  That
> means that page hasn't changed since last time we sent that page.  We
> shouldn't break in that case.  Just continue with next page, right?
> 
You are right I missed that , will be fixed

> On the other hand ... Why are we doing the stage == 1 test?  stage 1
> normally only sent part of the pages, so we could use the generic code
> there?  It would just return -1 as bytes_sent, and do the same code that
> we have now?

we need to add the pages to the cache in stage 1 (for the next stage),
and there is no need for checking if the page is cached.
and send the pages from the cache for consistency

> 
> The optimization for stage 3 is not done backwards?  We are inserting
> the page in the cache even if we are on stage 3.  In stage three we
> should:
> - look if page is on the cache: do usual xbrlze trick
> - if it is not, just sent the whole page without inserting it into the
> cache?  We are never going to reuse it, so putting it into the cache
> would not help us at all.  We are just making an extra copy?
right no need to insert the page into the cache in stage 3, I will remove it
> 
> 
>>  
>>      qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>>  
>>      expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
>>  
>> +    DPRINTF("ram_save_live: expected(%ld) <= max(%ld)?\n", expected_time,
>> +        migrate_max_downtime());
>> +
> 
> This belongs to debugging patch.
> 
>> +    /* load data and decode */
>> +    xbzrle_buf = g_malloc0(TARGET_PAGE_SIZE);
> 
> can't we have a static buffer of that size, and avoid all the
> malloc/free business?  If space is tight, we can allways put it on the
> xbrle structure and assign it only for migration.
good idea
> 
>> @@ -481,16 +657,33 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
>>              void *host;
>>  
>>              host = host_from_stream_offset(f, addr, flags);
>> +            if (!host) {
>> +                return -EINVAL;
>> +            }
> 
> Why is this check only needed now?
I wish I knew, looks like it is missing in upstream.
Do you think I should fix it separately ?

Thanks,
Orit
> 
> Later, Juan.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 8/9] Add set_cachesize command
  2012-06-01 11:19   ` Juan Quintela
@ 2012-06-06  2:14     ` Orit Wasserman
  0 siblings, 0 replies; 26+ messages in thread
From: Orit Wasserman @ 2012-06-06  2:14 UTC (permalink / raw)
  To: quintela
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, Benoit Hudzia,
	mdroth, blauwirbel, Petter Svard, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

On 06/01/2012 02:19 PM, Juan Quintela wrote:
> Orit Wasserman <owasserm@redhat.com> wrote:
>> Change XBZRLE cache size in bytes (the size should be a power of 2).
>> If XBZRLE cache size is too small there will be many cache miss.
>>
>> Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
>> Signed-off-by: Petter Svard <petters@cs.umu.se>
>> Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> 
>> +void qmp_migrate_set_cachesize(int64_t value, Error **errp)
>> +{
>> +    MigrationState *s = migrate_get_current();
>> +
>> +    /* Check for truncation */
>> +    if (value != (size_t)value) {
>> +        error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
>> +                  "exceeding address space");
>> +        return;
>> +    }
>> +
>> +    value = MIN(UINT64_MAX, value);
> 
> This looks fishy to say the least.  value is signed.  Is there any way
> that UINT64_MAX is going to be smaller than value?
> 
You are right.
I will remove it.

Orit

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live
  2012-06-06  2:13     ` Orit Wasserman
@ 2012-06-07 10:38       ` Juan Quintela
  0 siblings, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2012-06-07 10:38 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, Benoit Hudzia,
	mdroth, blauwirbel, Petter Svard, chegu_vinod, avi,
	Aidan Shribman, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> On 06/01/2012 02:42 PM, Juan Quintela wrote:
>> We are still not using this value, and we are sending it anyway (with a
>> value of zero).  What happens when we start using if for a checksum, and
>> we migration to a new version that "expects" it to be valid?  I would
>> preffer not to sent it, or sent the correct value.
> I think I will remove it, checksum should be used for all migration
> not just XBZRLE.
> I guess we can add it to the protocol in the future.

Agreed.
>> On the other hand ... Why are we doing the stage == 1 test?  stage 1
>> normally only sent part of the pages, so we could use the generic code
>> there?  It would just return -1 as bytes_sent, and do the same code that
>> we have now?
>
> we need to add the pages to the cache in stage 1 (for the next stage),
> and there is no need for checking if the page is cached.
> and send the pages from the cache for consistency

My question is: If we remove the check and just call the other function,
everything works, no?  So , why add the special case?

If it dont' work, we need to change it, because nothing warantees that
we fill the cache during stage .

>>>              void *host;
>>>  
>>>              host = host_from_stream_offset(f, addr, flags);
>>> +            if (!host) {
>>> +                return -EINVAL;
>>> +            }
>> 
>> Why is this check only needed now?
> I wish I knew, looks like it is missing in upstream.
> Do you think I should fix it separately ?

Yeap.

Thanks, Juan.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v11 2/9] Add migration capabilites
  2012-06-06  1:48     ` Orit Wasserman
@ 2012-06-07 10:41       ` Juan Quintela
  0 siblings, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2012-06-07 10:41 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: peter.maydell, aliguori, stefanha, qemu-devel, mdroth, blauwirbel,
	chegu_vinod, avi, pbonzini, eblake

Orit Wasserman <owasserm@redhat.com> wrote:
> On 06/01/2012 01:57 PM, Juan Quintela wrote:
>> Orit Wasserman <owasserm@redhat.com> wrote:
>>> Add migration capabiltes that can be queried by the management.
>>> The managment can query the source QEMU and the destination QEMU in order to
>>> verify both support some  migration capability (currently only XBZRLE).
>>> The managment can enable a capabilty for the next migration by using
>>> migrate_set_parameter command.
>>>
>>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
>>> +void qmp_migrate_set_parameter(const char *parameter, Error **errp)
>>> +{
>>> +    MigrationState *s = migrate_get_current();
>>> +    int i;
>>> +
>>> +    if (s->state == MIG_STATE_ACTIVE) {
>>> +        error_set(errp, QERR_MIGRATION_ACTIVE);
>>> +        return;
>>> +    }
>>> +
>>> +    for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
>>> +        if (strcmp(parameter, MigrationCapability_lookup[i]) == 0) {
>>> +            s->enabled_capabilities[i] = true;
>>> +            return;
>>> +        }
>>> +    }
>>> +
>>> +    error_set(errp, QERR_INVALID_PARAMETER, parameter);
>>> +}
>> 
>> Two things here:
>> - Is there a way to disable capabilities?  it seems no.
>
> In this implementation we can't disable a capability , do you see a
> need to add it ?

As we continue adding capabilities, I guess that at least for
testing. it is going to be needed.  Specially if we decide the path that
Anthony suggested:

set_capababilities(interesction(caps_source, caps_target))

if we do something like that, and we _know_ that we don't want a
capabilitie because we know it dont' work for our load, it sounds like a
good idea to have a good way, and the other reason is the next comment.
If we could have a capability that is _not_ a bool, we need to be able
to assign a value anyways.  Notice that I still don't know if we are
going to need it.  But can see one reason one way or another.

>
>> - Would we want in the future capabilities that are not "bool"?  Just
>>   asking loud, I haven't thought a lot about this.  Fixing it as a
>>   paramenter, it would make trivial to fix previous comment: cap:true vs
>>   cap:false, or whatever syntax we want.
> That is a good idea I will change it in next patch set.

Thanks, Juan.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-06-07 10:42 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-22 12:56 [Qemu-devel] [PATCH v11 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 1/9] Add MigrationParams structure Orit Wasserman
2012-06-01 10:51   ` Juan Quintela
2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 2/9] Add migration capabilites Orit Wasserman
2012-05-22 13:08   ` Eric Blake
2012-06-01 10:57   ` Juan Quintela
2012-06-06  1:48     ` Orit Wasserman
2012-06-07 10:41       ` Juan Quintela
2012-05-22 12:56 ` [Qemu-devel] [PATCH v11 3/9] Add XBZRLE documentation Orit Wasserman
2012-05-22 13:13   ` Eric Blake
2012-06-01 10:58   ` Juan Quintela
2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 4/9] Add cache handling functions Orit Wasserman
2012-06-01 11:01   ` Juan Quintela
2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 5/9] Add uleb encoding/decoding functions Orit Wasserman
2012-06-01 11:04   ` Juan Quintela
2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 6/9] Add save_block_hdr function Orit Wasserman
2012-06-01 11:04   ` Juan Quintela
2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live Orit Wasserman
2012-06-01 11:42   ` Juan Quintela
2012-06-06  2:13     ` Orit Wasserman
2012-06-07 10:38       ` Juan Quintela
2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 8/9] Add set_cachesize command Orit Wasserman
2012-06-01 11:19   ` Juan Quintela
2012-06-06  2:14     ` Orit Wasserman
2012-05-22 12:57 ` [Qemu-devel] [PATCH v11 9/9] Add XBZRLE statistics Orit Wasserman
2012-06-01 11:10   ` Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).