[Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app
@ 2012-05-16 11:59 Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 1/9] From: Isaku Yamahata <yamahata@valinux.co.jp> Orit Wasserman
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, Petter Svard, stefanha, mdroth, blauwirbel,
	Orit Wasserman, Benoit Hudzia, avi, pbonzini, Aidan Shribman

Changes from v9:
	- move cache implementation to separate files. Kept our own implementation because GCache or GHashTable have no size limit.
	- Add migrate_set_parameter function
	- removed XBZRLE option from migrate command
	- add cache size information to query_migrate command
	- add documantation file
	- write/read the exact XBZRLE header format
	- fix other review comments by Anthony and Juan

Changes from v8:
	Implement more effiecent cache_resize method
	fix set_cachesize command 

Changes from v7:
	Copy current page before encoding it, this will prevents page content
	change during the encoding.
	Allow changing the cache size during an active migration.
	Fix comments by Avi.

Changes from v6:
 1) add assert checks to ULEB encoding/decoding
 2) no need to send last zero run
	
Changes from v5:
1) Add migration capabilities
2) Use ULEB to encode run length
3) Do not send unmodified (dirty) page
3) Fix other patch comments

Using GCache or GHashTable requires allocating new buffer on every content change and have no size limit ,
so I decided to keep the simple cache implementation.

Changes from v4:
1) Rebase
2) divide patch into 9 patches
3) move memory allocation into cache_insert

Future work :
     Use SSE for encoding.
     Page ranking acording to their dirty rate and automatic activation/deactivation of the feature - will be sent in a separate patch series.	

By using XBZRLE (Xor Based Zero Run Length Encoding) we can reduce VM downtime
and total live-migration time of VMs running memory write intensive workloads
typical of large enterprise applications such as SAP ERP Systems, and generally
speaking for any application with a sparse memory update pattern.

The compression format uses the fact that we will have many zero (zero represents
an unchanged value). 
We repesent the page data delta by zero and non zero runs.
We represent a zero run with it's length (in bytes). 
We represent a non zero run with it's length (in bytes) and the data.
The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)

page = zrun nzrun
       | zrun nzrun page

zrun = length

nzrun = length byte...

length = uleb128 encoded integer

On the sender side XBZRLE is used as a compact delta encoding of page updates,
retrieving the old page content from an LRU cache (default size of 512 MB). The
receiving side uses the existing page content and XBZRLE to decode the new page
content.

This is a more compact way to store the delta than the previous version.

This work was originally based on research results published VEE 2011: Evaluation of
Delta Compression Techniques for Efficient Live Migration of Large Virtual
Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta encoder
XBRLE was improved further using XBZRLE instead.

XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
ideal for in-line, real-time encoding such as is needed for live-migration.

A typical usage scenario:
    {qemu} migrate_set_cachesize 256m
    {qemu} migrate_set_parameter xbzrle
    {qemu} migrate -d tcp:destination.host:4444
    {qemu} info migrate
    ...
    transferred ram: A kbytes
    remaining ram: B kbytes
    total ram: C kbytes
    cache size: D bytes
    xbzrle transferred: E kbytes
    xbzrle pages: F pages
    xbzrle cache miss: G
    xbzrle overflow : H

Testing: live migration with XBZRLE completed in 110 seconds, without live
migration was not able to complete.

A simple synthetic memory r/w load generator:
..    include <stdlib.h>
..    include <stdio.h>
..    int main()
..    {
..        char *buf = (char *) calloc(4096, 4096);
..        while (1) {
..            int i;
..            for (i = 0; i < 4096 * 4; i++) {
..                buf[i * 4096 / 4]++;
..            }
..            printf(".");
..        }
..    }

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>

Orit Wasserman (9):
  From: Isaku Yamahata <yamahata@valinux.co.jp>
  Add migration capabilites
  Add XBZRLE documentation
  Add cache handling functions
  Add uleb encoding/decoding functions
  Add save_block_hdr function
  Add XBZRLE to ram_save_block and ram_save_live
  Add set_cachesize command
  Add XBZRLE statistics

 Makefile.objs        |    1 +
 arch_init.c          |  317 +++++++++++++++++++++++++++++++++++++++++++++----
 block-migration.c    |    8 +-
 cache.c              |  212 +++++++++++++++++++++++++++++++++
 cutils.c             |   29 +++++
 docs/xbzrle.txt      |   97 +++++++++++++++
 hmp-commands.hx      |   31 +++++
 hmp.c                |   67 +++++++++++
 hmp.h                |    3 +
 include/qemu/cache.h |   81 +++++++++++++
 migration.c          |  134 ++++++++++++++++++++--
 migration.h          |   29 +++++-
 monitor.c            |    7 +
 qapi-schema.json     |   84 +++++++++++++-
 qemu-common.h        |   14 +++
 qmp-commands.hx      |   97 +++++++++++++++
 savevm.c             |  102 ++++++++++++++++-
 sysemu.h             |    3 +-
 vmstate.h            |    2 +-
 19 files changed, 1270 insertions(+), 48 deletions(-)
 create mode 100644 cache.c
 create mode 100644 docs/xbzrle.txt
 create mode 100644 include/qemu/cache.h

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 1/9] From: Isaku Yamahata <yamahata@valinux.co.jp>
  2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
@ 2012-05-16 11:59 ` Orit Wasserman
  2012-05-16 12:43   ` Peter Maydell
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 2/9] Add migration capabilites Orit Wasserman
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, stefanha, mdroth, blauwirbel, Orit Wasserman,
	avi, pbonzini, Isaku Yamahata

Add MigrationParams structure

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 block-migration.c |    8 ++++----
 migration.c       |   13 ++++++++-----
 migration.h       |    8 ++++++--
 qemu-common.h     |    1 +
 savevm.c          |   11 ++++++++---
 sysemu.h          |    3 ++-
 vmstate.h         |    2 +-
 7 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index fd2ffff..b95b4e1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -700,13 +700,13 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
     return 0;
 }
 
-static void block_set_params(int blk_enable, int shared_base, void *opaque)
+static void block_set_params(const MigrationParams *params, void *opaque)
 {
-    block_mig_state.blk_enable = blk_enable;
-    block_mig_state.shared_base = shared_base;
+    block_mig_state.blk_enable = params->blk;
+    block_mig_state.shared_base = params->shared;
 
     /* shared base means that blk_enable = 1 */
-    block_mig_state.blk_enable |= shared_base;
+    block_mig_state.blk_enable |= params->shared;
 }
 
 void blk_mig_init(void)
diff --git a/migration.c b/migration.c
index f9e968e..9d1d925 100644
--- a/migration.c
+++ b/migration.c
@@ -352,7 +352,7 @@ void migrate_fd_connect(MigrationState *s)
                                       migrate_fd_close);
 
     DPRINTF("beginning savevm\n");
-    ret = qemu_savevm_state_begin(s->file, s->blk, s->shared);
+    ret = qemu_savevm_state_begin(s->file, &s->params);
     if (ret < 0) {
         DPRINTF("failed, %d\n", ret);
         migrate_fd_error(s);
@@ -361,15 +361,14 @@ void migrate_fd_connect(MigrationState *s)
     migrate_fd_put_ready(s);
 }
 
-static MigrationState *migrate_init(int blk, int inc)
+static MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
 
     memset(s, 0, sizeof(*s));
     s->bandwidth_limit = bandwidth_limit;
-    s->blk = blk;
-    s->shared = inc;
+    s->params = *params;
 
     s->bandwidth_limit = bandwidth_limit;
     s->state = MIG_STATE_SETUP;
@@ -394,9 +393,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  Error **errp)
 {
     MigrationState *s = migrate_get_current();
+    MigrationParams params;
     const char *p;
     int ret;
 
+    params.blk = blk;
+    params.shared = inc;
+
     if (s->state == MIG_STATE_ACTIVE) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -411,7 +414,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         return;
     }
 
-    s = migrate_init(blk, inc);
+    s = migrate_init(&params);
 
     if (strstart(uri, "tcp:", &p)) {
         ret = tcp_start_outgoing_migration(s, p);
diff --git a/migration.h b/migration.h
index 691b367..9e3bba7 100644
--- a/migration.h
+++ b/migration.h
@@ -19,6 +19,11 @@
 #include "notify.h"
 #include "error.h"
 
+struct MigrationParams {
+    int blk;
+    int shared;
+};
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -31,8 +36,7 @@ struct MigrationState
     int (*close)(MigrationState *s);
     int (*write)(MigrationState *s, const void *buff, size_t size);
     void *opaque;
-    int blk;
-    int shared;
+    MigrationParams params;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-common.h b/qemu-common.h
index 50f659a..30c59c8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -17,6 +17,7 @@ typedef struct DeviceState DeviceState;
 
 struct Monitor;
 typedef struct Monitor Monitor;
+typedef struct MigrationParams MigrationParams;
 
 /* we put basic includes here to avoid repeating them in device drivers */
 #include <stdlib.h>
diff --git a/savevm.c b/savevm.c
index 2d18bab..dd66f2c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1561,7 +1561,8 @@ bool qemu_savevm_state_blocked(Error **errp)
     return false;
 }
 
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
+int qemu_savevm_state_begin(QEMUFile *f,
+                            const MigrationParams *params)
 {
     SaveStateEntry *se;
     int ret;
@@ -1570,7 +1571,7 @@ int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
         if(se->set_params == NULL) {
             continue;
 	}
-	se->set_params(blk_enable, shared, se->opaque);
+        se->set_params(params, se->opaque);
     }
     
     qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
@@ -1708,13 +1709,17 @@ void qemu_savevm_state_cancel(QEMUFile *f)
 static int qemu_savevm_state(QEMUFile *f)
 {
     int ret;
+    MigrationParams params = {
+        .blk = 0,
+        .shared = 0
+    };
 
     if (qemu_savevm_state_blocked(NULL)) {
         ret = -EINVAL;
         goto out;
     }
 
-    ret = qemu_savevm_state_begin(f, 0, 0);
+    ret = qemu_savevm_state_begin(f, &params);
     if (ret < 0)
         goto out;
 
diff --git a/sysemu.h b/sysemu.h
index bc2c788..6540c79 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -77,7 +77,8 @@ void do_info_snapshots(Monitor *mon);
 void qemu_announce_self(void);
 
 bool qemu_savevm_state_blocked(Error **errp);
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared);
+int qemu_savevm_state_begin(QEMUFile *f,
+                            const MigrationParams *params);
 int qemu_savevm_state_iterate(QEMUFile *f);
 int qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(QEMUFile *f);
diff --git a/vmstate.h b/vmstate.h
index 82d97ae..5af45e0 100644
--- a/vmstate.h
+++ b/vmstate.h
@@ -26,7 +26,7 @@
 #ifndef QEMU_VMSTATE_H
 #define QEMU_VMSTATE_H 1
 
-typedef void SaveSetParamsHandler(int blk_enable, int shared, void * opaque);
+typedef void SaveSetParamsHandler(const MigrationParams *params, void * opaque);
 typedef void SaveStateHandler(QEMUFile *f, void *opaque);
 typedef int SaveLiveStateHandler(QEMUFile *f, int stage, void *opaque);
 typedef int LoadStateHandler(QEMUFile *f, void *opaque, int version_id);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v10 1/9] From: Isaku Yamahata <yamahata@valinux.co.jp>
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 1/9] From: Isaku Yamahata <yamahata@valinux.co.jp> Orit Wasserman
@ 2012-05-16 12:43   ` Peter Maydell
  2012-05-16 16:50     ` [Qemu-devel] [PATCH v10 1/9] Add MigrationParams structure Orit Wasserman
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2012-05-16 12:43 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: aliguori, quintela, stefanha, mdroth, qemu-devel, blauwirbel,
	Isaku Yamahata, avi, pbonzini

On 16 May 2012 12:59, Orit Wasserman <owasserm@redhat.com> wrote:
> Add MigrationParams structure
>
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

You seem to have managed to get the From: authorship line in
the subject commit summary line somehow in this patch...

-- PMM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 1/9] Add MigrationParams structure
  2012-05-16 12:43   ` Peter Maydell
@ 2012-05-16 16:50     ` Orit Wasserman
  0 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 16:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, Isaku Yamahata

From: Isaku Yamahata <yamahata@valinux.co.jp>

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 block-migration.c |    8 ++++----
 migration.c       |   13 ++++++++-----
 migration.h       |    8 ++++++--
 qemu-common.h     |    1 +
 savevm.c          |   11 ++++++++---
 sysemu.h          |    3 ++-
 vmstate.h         |    2 +-
 7 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index fd2ffff..b95b4e1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -700,13 +700,13 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
     return 0;
 }
 
-static void block_set_params(int blk_enable, int shared_base, void *opaque)
+static void block_set_params(const MigrationParams *params, void *opaque)
 {
-    block_mig_state.blk_enable = blk_enable;
-    block_mig_state.shared_base = shared_base;
+    block_mig_state.blk_enable = params->blk;
+    block_mig_state.shared_base = params->shared;
 
     /* shared base means that blk_enable = 1 */
-    block_mig_state.blk_enable |= shared_base;
+    block_mig_state.blk_enable |= params->shared;
 }
 
 void blk_mig_init(void)
diff --git a/migration.c b/migration.c
index f9e968e..9d1d925 100644
--- a/migration.c
+++ b/migration.c
@@ -352,7 +352,7 @@ void migrate_fd_connect(MigrationState *s)
                                       migrate_fd_close);
 
     DPRINTF("beginning savevm\n");
-    ret = qemu_savevm_state_begin(s->file, s->blk, s->shared);
+    ret = qemu_savevm_state_begin(s->file, &s->params);
     if (ret < 0) {
         DPRINTF("failed, %d\n", ret);
         migrate_fd_error(s);
@@ -361,15 +361,14 @@ void migrate_fd_connect(MigrationState *s)
     migrate_fd_put_ready(s);
 }
 
-static MigrationState *migrate_init(int blk, int inc)
+static MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
 
     memset(s, 0, sizeof(*s));
     s->bandwidth_limit = bandwidth_limit;
-    s->blk = blk;
-    s->shared = inc;
+    s->params = *params;
 
     s->bandwidth_limit = bandwidth_limit;
     s->state = MIG_STATE_SETUP;
@@ -394,9 +393,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  Error **errp)
 {
     MigrationState *s = migrate_get_current();
+    MigrationParams params;
     const char *p;
     int ret;
 
+    params.blk = blk;
+    params.shared = inc;
+
     if (s->state == MIG_STATE_ACTIVE) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -411,7 +414,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         return;
     }
 
-    s = migrate_init(blk, inc);
+    s = migrate_init(&params);
 
     if (strstart(uri, "tcp:", &p)) {
         ret = tcp_start_outgoing_migration(s, p);
diff --git a/migration.h b/migration.h
index 691b367..9e3bba7 100644
--- a/migration.h
+++ b/migration.h
@@ -19,6 +19,11 @@
 #include "notify.h"
 #include "error.h"
 
+struct MigrationParams {
+    int blk;
+    int shared;
+};
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -31,8 +36,7 @@ struct MigrationState
     int (*close)(MigrationState *s);
     int (*write)(MigrationState *s, const void *buff, size_t size);
     void *opaque;
-    int blk;
-    int shared;
+    MigrationParams params;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-common.h b/qemu-common.h
index 50f659a..30c59c8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -17,6 +17,7 @@ typedef struct DeviceState DeviceState;
 
 struct Monitor;
 typedef struct Monitor Monitor;
+typedef struct MigrationParams MigrationParams;
 
 /* we put basic includes here to avoid repeating them in device drivers */
 #include <stdlib.h>
diff --git a/savevm.c b/savevm.c
index 2d18bab..dd66f2c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1561,7 +1561,8 @@ bool qemu_savevm_state_blocked(Error **errp)
     return false;
 }
 
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
+int qemu_savevm_state_begin(QEMUFile *f,
+                            const MigrationParams *params)
 {
     SaveStateEntry *se;
     int ret;
@@ -1570,7 +1571,7 @@ int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
         if(se->set_params == NULL) {
             continue;
 	}
-	se->set_params(blk_enable, shared, se->opaque);
+        se->set_params(params, se->opaque);
     }
     
     qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
@@ -1708,13 +1709,17 @@ void qemu_savevm_state_cancel(QEMUFile *f)
 static int qemu_savevm_state(QEMUFile *f)
 {
     int ret;
+    MigrationParams params = {
+        .blk = 0,
+        .shared = 0
+    };
 
     if (qemu_savevm_state_blocked(NULL)) {
         ret = -EINVAL;
         goto out;
     }
 
-    ret = qemu_savevm_state_begin(f, 0, 0);
+    ret = qemu_savevm_state_begin(f, &params);
     if (ret < 0)
         goto out;
 
diff --git a/sysemu.h b/sysemu.h
index bc2c788..6540c79 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -77,7 +77,8 @@ void do_info_snapshots(Monitor *mon);
 void qemu_announce_self(void);
 
 bool qemu_savevm_state_blocked(Error **errp);
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared);
+int qemu_savevm_state_begin(QEMUFile *f,
+                            const MigrationParams *params);
 int qemu_savevm_state_iterate(QEMUFile *f);
 int qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(QEMUFile *f);
diff --git a/vmstate.h b/vmstate.h
index 82d97ae..5af45e0 100644
--- a/vmstate.h
+++ b/vmstate.h
@@ -26,7 +26,7 @@
 #ifndef QEMU_VMSTATE_H
 #define QEMU_VMSTATE_H 1
 
-typedef void SaveSetParamsHandler(int blk_enable, int shared, void * opaque);
+typedef void SaveSetParamsHandler(const MigrationParams *params, void * opaque);
 typedef void SaveStateHandler(QEMUFile *f, void *opaque);
 typedef int SaveLiveStateHandler(QEMUFile *f, int stage, void *opaque);
 typedef int LoadStateHandler(QEMUFile *f, void *opaque, int version_id);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 2/9] Add migration capabilites
  2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 1/9] From: Isaku Yamahata <yamahata@valinux.co.jp> Orit Wasserman
@ 2012-05-16 11:59 ` Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 3/9] Add XBZRLE documentation Orit Wasserman
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, stefanha, mdroth, blauwirbel, Orit Wasserman,
	avi, pbonzini

Add migration capabiltes that can be queried by the management.
The managment can query the source QEMU and the destination QEMU in order to
verify both support some  migration capability (currently only XBZRLE).
The managment can enable a capabilty for the next migration by using
migrate_set_parameter command.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 hmp-commands.hx  |   16 ++++++++++++++++
 hmp.c            |   41 +++++++++++++++++++++++++++++++++++++++++
 hmp.h            |    2 ++
 migration.c      |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 migration.h      |    2 ++
 monitor.c        |    7 +++++++
 qapi-schema.json |   46 +++++++++++++++++++++++++++++++++++++++++++++-
 qmp-commands.hx  |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 savevm.c         |    2 +-
 9 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 18cb415..e14e7be 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -861,6 +861,20 @@ Set maximum tolerated downtime (in seconds) for migration.
 ETEXI
 
     {
+        .name       = "migrate_set_parameter",
+        .args_type  = "parameter:s",
+        .params     = "parameter",
+        .help       = "Enable the usage of a capability for migration",
+        .mhandler.cmd = hmp_migrate_set_parameter,
+    },
+
+STEXI
+@item migrate_set_parameter @var{parameter}
+@findex migrate_set_parameter
+Enable the usage of a capability @var{parameter} for migration.
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
@@ -1393,6 +1407,8 @@ show CPU statistics
 show user network stack connection states
 @item info migrate
 show migration status
+@item info migration_capabilities
+show migration capabilities
 @item info balloon
 show balloon information
 @item info qtree
diff --git a/hmp.c b/hmp.c
index 1f9fe0e..e73132b 100644
--- a/hmp.c
+++ b/hmp.c
@@ -128,9 +128,18 @@ void hmp_info_mice(Monitor *mon)
 void hmp_info_migrate(Monitor *mon)
 {
     MigrationInfo *info;
+    MigrationCapabilityInfoList *cap;
 
     info = qmp_query_migrate(NULL);
 
+    if (info->has_params && info->params) {
+        monitor_printf(mon, "params: ");
+        for (cap = info->params; cap; cap = cap->next) {
+            monitor_printf(mon, "%s",
+                           MigrationCapability_lookup[cap->value->capability]);
+        }
+        monitor_printf(mon, "\n");
+    }
     if (info->has_status) {
         monitor_printf(mon, "Migration status: %s\n", info->status);
     }
@@ -156,6 +165,24 @@ void hmp_info_migrate(Monitor *mon)
     qapi_free_MigrationInfo(info);
 }
 
+void hmp_info_migration_capabilities(Monitor *mon)
+{
+    MigrationCapabilityInfoList *caps_list, *cap;
+
+    caps_list = qmp_query_migration_capabilities(NULL);
+    if (!caps_list) {
+        monitor_printf(mon, "No migration capabilities found\n");
+        return;
+    }
+
+    for (cap = caps_list; cap; cap = cap->next) {
+        monitor_printf(mon, "%s ",
+                       MigrationCapability_lookup[cap->value->capability]);
+    }
+
+    qapi_free_MigrationCapabilityInfoList(caps_list);
+}
+
 void hmp_info_cpus(Monitor *mon)
 {
     CpuInfoList *cpu_list, *cpu;
@@ -730,6 +757,20 @@ void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict)
     qmp_migrate_set_speed(value, NULL);
 }
 
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
+{
+    const char *value = qdict_get_str(qdict, "parameter");
+    Error *err = NULL;
+
+    qmp_migrate_set_parameter(value, &err);
+
+    if (err) {
+        monitor_printf(mon, "migrate_set_parameter: %s\n",
+                       error_get_pretty(err));
+        error_free(err);
+    }
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 443b812..5f9d842 100644
--- a/hmp.h
+++ b/hmp.h
@@ -25,6 +25,7 @@ void hmp_info_uuid(Monitor *mon);
 void hmp_info_chardev(Monitor *mon);
 void hmp_info_mice(Monitor *mon);
 void hmp_info_migrate(Monitor *mon);
+void hmp_info_migration_capabilities(Monitor *mon);
 void hmp_info_cpus(Monitor *mon);
 void hmp_info_block(Monitor *mon);
 void hmp_info_blockstats(Monitor *mon);
@@ -51,6 +52,7 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/migration.c b/migration.c
index 9d1d925..66e71a3 100644
--- a/migration.c
+++ b/migration.c
@@ -117,10 +117,22 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 {
     MigrationInfo *info = g_malloc0(sizeof(*info));
     MigrationState *s = migrate_get_current();
+    int i;
 
     switch (s->state) {
     case MIG_STATE_SETUP:
-        /* no migration has happened ever */
+        /* no migration has happened ever show enabled capabilities */
+        for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
+            if (s->enabled_capabilities[i]) {
+                if (!info->has_params) {
+                    info->params = g_malloc0(sizeof(*info->params));
+                    info->has_params = true;
+                }
+                info->params->value = g_malloc(sizeof(*info->params->value));
+                info->params->value->capability = i;
+                info->params->next = NULL;
+            }
+        }
         break;
     case MIG_STATE_ACTIVE:
         info->has_status = true;
@@ -157,6 +169,38 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     return info;
 }
 
+MigrationCapabilityInfoList *qmp_query_migration_capabilities(Error **errp)
+{
+    MigrationCapabilityInfoList *caps_list = g_malloc0(sizeof(*caps_list));
+
+    caps_list->value = g_malloc(sizeof(*caps_list->value));
+    caps_list->value->capability = MIGRATION_CAPABILITY_XBZRLE;
+    caps_list->next = NULL;
+
+    return caps_list;
+}
+
+
+void qmp_migrate_set_parameter(const char *parameter, Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+    int i;
+
+    if (s->state == MIG_STATE_ACTIVE) {
+        error_set(errp, QERR_MIGRATION_ACTIVE);
+        return;
+    }
+
+    for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
+        if (strcmp(parameter, MigrationCapability_lookup[i]) == 0) {
+            s->enabled_capabilities[i] = true;
+            return;
+        }
+    }
+
+    error_set(errp, QERR_INVALID_PARAMETER, parameter);
+}
+
 /* shared migration helpers */
 
 static int migrate_fd_cleanup(MigrationState *s)
@@ -365,12 +409,17 @@ static MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
+    bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
+
+    memcpy(enabled_capabilities, s->enabled_capabilities,
+           sizeof(enabled_capabilities));
 
     memset(s, 0, sizeof(*s));
     s->bandwidth_limit = bandwidth_limit;
     s->params = *params;
+    memcpy(s->enabled_capabilities, enabled_capabilities,
+           sizeof(enabled_capabilities));
 
-    s->bandwidth_limit = bandwidth_limit;
     s->state = MIG_STATE_SETUP;
 
     return s;
diff --git a/migration.h b/migration.h
index 9e3bba7..e42b0a0 100644
--- a/migration.h
+++ b/migration.h
@@ -18,6 +18,7 @@
 #include "qemu-common.h"
 #include "notify.h"
 #include "error.h"
+#include "qapi-types.h"
 
 struct MigrationParams {
     int blk;
@@ -37,6 +38,7 @@ struct MigrationState
     int (*write)(MigrationState *s, const void *buff, size_t size);
     void *opaque;
     MigrationParams params;
+    bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/monitor.c b/monitor.c
index 12a6fe2..0233bc3 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2558,6 +2558,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.info = hmp_info_migrate,
     },
     {
+        .name       = "migration_capabilities",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show migration capabilities",
+        .mhandler.info = hmp_info_migration_capabilities,
+    },
+    {
         .name       = "balloon",
         .args_type  = "",
         .params     = "",
diff --git a/qapi-schema.json b/qapi-schema.json
index 2ca7195..2887c51 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -264,7 +264,7 @@
 ##
 { 'type': 'MigrationInfo',
   'data': {'*status': 'str', '*ram': 'MigrationStats',
-           '*disk': 'MigrationStats'} }
+           '*disk': 'MigrationStats', '*params': ['MigrationCapabilityInfo']} }
 
 ##
 # @query-migrate
@@ -278,6 +278,50 @@
 { 'command': 'query-migrate', 'returns': 'MigrationInfo' }
 
 ##
+# @MigrationCapability
+#
+# Migration capabilities enumaration
+#
+# @xbzrle: current migration supports xbzrle
+#
+# Since: 1.1
+##
+{ 'enum': 'MigrationCapability',
+  'data': ['xbzrle'] }
+
+##
+# @MigrationCapabilityInfo
+#
+# Migration capability information
+#
+# @capability: capability enum
+#
+# Since: 1.2
+##
+{ 'type': 'MigrationCapabilityInfo',
+  'data': { 'capability' : 'MigrationCapability'} }
+
+##
+# @query-migration-capabilities
+#
+# Returns information about current migration process capabilties.
+#
+# Returns: @MigrationCapabilityInfo list
+#
+# Since: 1.2
+##
+{ 'command': 'query-migration-capabilities', 'returns': ['MigrationCapabilityInfo'] }
+
+##
+# @migrate_set_parameter
+#
+# Set the following migration parameters (like xbzrle )
+##
+# Since: 1.2
+##
+{ 'command': 'migrate-set-parameter', 'data': { 'parameter': 'str' } }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index c810c74..f276e08 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2068,6 +2068,53 @@ EQMP
     },
 
 SQMP
+query-migration-capabilities
+-------
+
+Query migration capabilities
+
+- "xbzrle": xbzrle support
+
+Arguments:
+
+Example:
+
+-> { "execute": "query-migration-capabilities"}
+<- { "return": { "xbzrle" }
+
+EQMP
+
+    {
+        .name       = "query-migration-capabilities",
+        .args_type  = "",
+	.mhandler.cmd_new = qmp_marshal_input_query_migration_capabilities,
+    },
+
+SQMP
+migrate_set_parameter
+-------
+
+Enable migration parameter
+
+- "xbzrle": xbzrle support
+
+Arguments:
+
+Example:
+
+-> { "execute": "migrate_set_parameter" , "arguments": { "parameter": xbzrle"} }
+
+EQMP
+
+    {
+        .name       = "migrate_set_parameter",
+        .args_type  = "parameter:s",
+	.mhandler.cmd_new = qmp_marshal_input_migrate_set_parameter,
+    },
+
+
+
+SQMP
 query-balloon
 -------------
 
diff --git a/savevm.c b/savevm.c
index dd66f2c..42937a0 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1711,7 +1711,7 @@ static int qemu_savevm_state(QEMUFile *f)
     int ret;
     MigrationParams params = {
         .blk = 0,
-        .shared = 0
+        .shared = 0,
     };
 
     if (qemu_savevm_state_blocked(NULL)) {
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 3/9] Add XBZRLE documentation
  2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 1/9] From: Isaku Yamahata <yamahata@valinux.co.jp> Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 2/9] Add migration capabilites Orit Wasserman
@ 2012-05-16 11:59 ` Orit Wasserman
  2012-05-16 16:28   ` Eric Blake
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 4/9] Add cache handling functions Orit Wasserman
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, stefanha, mdroth, blauwirbel, Orit Wasserman,
	avi, pbonzini

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 docs/xbzrle.txt |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 97 insertions(+), 0 deletions(-)
 create mode 100644 docs/xbzrle.txt

diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
new file mode 100644
index 0000000..aafdb84
--- /dev/null
+++ b/docs/xbzrle.txt
@@ -0,0 +1,97 @@
+XBZRLE (Xor Based Zero Run Length Encoding)
+===========================================
+
+Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction of VM downtime
+and the total live-migration time of Virtual machines. It is particularly useful for virtual machines running memory write intensive workloads that are typical of large enterprise applications such as SAP ERP Systems, and generally
+speaking for any application that uses a sparse memory update pattern.
+
+Instead of sending the changed guest memory page this solution will send a compressed version of the updates, thus reducing the amount of data sent during live migration.
+In order to be able to calculate the update, the previous memory pages needed to be stored. Those pages are stored in a dedicated cache (hash table) and are accessed by their address.
+The larger the cache size the better the chances are that the page has already been stored in the cache. A Small cache size will result in high cache miss rate.
+
+Format
+=======
+
+The compression format uses the zero value, where zero represents an unchanged value.
+The page data delta is represented by zero and non zero runs.
+A zero run is represented by it's length (in bytes).
+A non zero run is represented by it's length (in bytes) and the data.
+The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+
+page = zrun nzrun
+       | zrun nzrun page
+
+zrun = length
+
+nzrun = length byte...
+
+length = uleb128 encoded integer
+
+On the sender side XBZRLE is used as a compact delta encoding of page updates,
+retrieving the old page content from the cache (default size of 512 MB). The
+receiving side uses the existing page's content and XBZRLE to decode the new page's content.
+
+This is a more compact way to store the deltas than the previous version.
+
+This work was originally based on research results published VEE 2011: Evaluation of
+Delta Compression Techniques for Efficient Live Migration of Large Virtual
+Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta encoder
+XBRLE was improved further using the XBZRLE instead.
+
+XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
+ideal for in-line, real-time encoding such as is needed for live-migration.
+
+Migration Capabilities
+======================
+In order to use XBZRLE the destination QEMU version should be able to
+decode the new format.
+Adding a new migration capabilities command that will allow external management
+to query for it support.
+A typical use for the destination
+    {qemu} info migrate_capabilities
+    {qemu} xbzrle, ...
+
+In order to enable capabilities for future live migration,
+a new command migrate_set_parameter is introduced:
+    {qemu} migrate_set_parameter xbzrle
+
+Usage
+======
+
+1. Activate xbzrle
+2. Set the XBZRLE cache size - the cache size is in MBytes and should be a power of 2. The cache default value is 64MBytes.
+3. start outgoing migration
+
+A typical usage scenario:
+    {qemu} migrate_set_parameter xbzrle
+    {qemu} migrate_set_cachesize 256m
+    {qemu} migrate -d tcp:destination.host:4444
+    {qemu} info migrate
+    ...
+    transferred ram-duplicate: A kbytes
+    transferred ram-normal: B kbytes
+    transferred ram-xbrle: C kbytes
+    overflow ram-xbrle: D pages
+    cache-miss ram-xbrle: E pages
+
+cache-miss: the number of cache misses to date - high cache-miss rate
+indicates that the cache size is set too low.
+overflow: the number of overflows in the decoding which where the delta could not be compressed. This can happen if the changes in the pages are too large
+or there are many short changes for example change every second byte (half a page).
+
+Testing: Testing indicated that live migration with XBZRLE was completed in 110 seconds, whereas without it would not be able to complete.
+
+A simple synthetic memory r/w load generator:
+..    include <stdlib.h>
+..    include <stdio.h>
+..    int main()
+..    {
+..        char *buf = (char *) calloc(4096, 4096);
+..        while (1) {
+..            int i;
+..            for (i = 0; i < 4096 * 4; i++) {
+..                buf[i * 4096 / 4]++;
+..            }
+..            printf(".");
+..        }
+..    }
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v10 3/9] Add XBZRLE documentation
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 3/9] Add XBZRLE documentation Orit Wasserman
@ 2012-05-16 16:28   ` Eric Blake
  2012-05-16 16:53     ` Orit Wasserman
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2012-05-16 16:28 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: aliguori, quintela, stefanha, mdroth, qemu-devel, blauwirbel, avi,
	pbonzini

[-- Attachment #1: Type: text/plain, Size: 2792 bytes --]

On 05/16/2012 05:59 AM, Orit Wasserman wrote:
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  docs/xbzrle.txt |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 97 insertions(+), 0 deletions(-)
>  create mode 100644 docs/xbzrle.txt
> 
> diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
> new file mode 100644
> index 0000000..aafdb84
> --- /dev/null
> +++ b/docs/xbzrle.txt
> @@ -0,0 +1,97 @@
> +XBZRLE (Xor Based Zero Run Length Encoding)
> +===========================================
> +
> +Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction of VM downtime
> +and the total live-migration time of Virtual machines. It is particularly useful for virtual machines running memory write intensive workloads that are typical of large enterprise applications such as SAP ERP Systems, and generally

Any reason you aren't wrapping at column 80?

> +speaking for any application that uses a sparse memory update pattern.
> +
> +Instead of sending the changed guest memory page this solution will send a compressed version of the updates, thus reducing the amount of data sent during live migration.
> +In order to be able to calculate the update, the previous memory pages needed to be stored. Those pages are stored in a dedicated cache (hash table) and are accessed by their address.
> +The larger the cache size the better the chances are that the page has already been stored in the cache. A Small cache size will result in high cache miss rate.

s/Small/small/

> +Usage
> +======
> +
> +1. Activate xbzrle
> +2. Set the XBZRLE cache size - the cache size is in MBytes and should be a power of 2. The cache default value is 64MBytes.
> +3. start outgoing migration
> +
> +A typical usage scenario:
> +    {qemu} migrate_set_parameter xbzrle
> +    {qemu} migrate_set_cachesize 256m
> +    {qemu} migrate -d tcp:destination.host:4444
> +    {qemu} info migrate
> +    ...
> +    transferred ram-duplicate: A kbytes
> +    transferred ram-normal: B kbytes
> +    transferred ram-xbrle: C kbytes
> +    overflow ram-xbrle: D pages
> +    cache-miss ram-xbrle: E pages
> +
> +cache-miss: the number of cache misses to date - high cache-miss rate
> +indicates that the cache size is set too low.
> +overflow: the number of overflows in the decoding which where the delta could not be compressed. This can happen if the changes in the pages are too large
> +or there are many short changes for example change every second byte (half a page).

Can cachesize be modified during an in-progress migration?  Do both
source and destination need to agree on cache size?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v10 3/9] Add XBZRLE documentation
  2012-05-16 16:28   ` Eric Blake
@ 2012-05-16 16:53     ` Orit Wasserman
  0 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 16:53 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, quintela, stefanha, mdroth, qemu-devel, blauwirbel, avi,
	pbonzini

On 05/16/2012 07:28 PM, Eric Blake wrote:
> On 05/16/2012 05:59 AM, Orit Wasserman wrote:
>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
>> ---
>>  docs/xbzrle.txt |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 97 insertions(+), 0 deletions(-)
>>  create mode 100644 docs/xbzrle.txt
>>
>> diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
>> new file mode 100644
>> index 0000000..aafdb84
>> --- /dev/null
>> +++ b/docs/xbzrle.txt
>> @@ -0,0 +1,97 @@
>> +XBZRLE (Xor Based Zero Run Length Encoding)
>> +===========================================
>> +
>> +Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction of VM downtime
>> +and the total live-migration time of Virtual machines. It is particularly useful for virtual machines running memory write intensive workloads that are typical of large enterprise applications such as SAP ERP Systems, and generally
> 
> Any reason you aren't wrapping at column 80?
I will fix it.
> 
>> +speaking for any application that uses a sparse memory update pattern.
>> +
>> +Instead of sending the changed guest memory page this solution will send a compressed version of the updates, thus reducing the amount of data sent during live migration.
>> +In order to be able to calculate the update, the previous memory pages needed to be stored. Those pages are stored in a dedicated cache (hash table) and are accessed by their address.
>> +The larger the cache size the better the chances are that the page has already been stored in the cache. A Small cache size will result in high cache miss rate.
> 
> s/Small/small/
> 
>> +Usage
>> +======
>> +
>> +1. Activate xbzrle
>> +2. Set the XBZRLE cache size - the cache size is in MBytes and should be a power of 2. The cache default value is 64MBytes.
>> +3. start outgoing migration
>> +
>> +A typical usage scenario:
>> +    {qemu} migrate_set_parameter xbzrle
>> +    {qemu} migrate_set_cachesize 256m
>> +    {qemu} migrate -d tcp:destination.host:4444
>> +    {qemu} info migrate
>> +    ...
>> +    transferred ram-duplicate: A kbytes
>> +    transferred ram-normal: B kbytes
>> +    transferred ram-xbrle: C kbytes
>> +    overflow ram-xbrle: D pages
>> +    cache-miss ram-xbrle: E pages
>> +
>> +cache-miss: the number of cache misses to date - high cache-miss rate
>> +indicates that the cache size is set too low.
>> +overflow: the number of overflows in the decoding which where the delta could not be compressed. This can happen if the changes in the pages are too large
>> +or there are many short changes for example change every second byte (half a page).
> 
> Can cachesize be modified during an in-progress migration?  Do both
> source and destination need to agree on cache size?
Yes, you can resize the cache  during ongoing migration.
Only the source QEMU uses the cache.

Orit
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 4/9] Add cache handling functions
  2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (2 preceding siblings ...)
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 3/9] Add XBZRLE documentation Orit Wasserman
@ 2012-05-16 11:59 ` Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 5/9] Add uleb encoding/decoding functions Orit Wasserman
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, Petter Svard, stefanha, mdroth, blauwirbel,
	Orit Wasserman, Benoit Hudzia, avi, pbonzini, Aidan Shribman

Add LRU page cache mechanism.
The page are accessed by their address.

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 Makefile.objs        |    1 +
 cache.c              |  212 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/qemu/cache.h |   81 +++++++++++++++++++
 3 files changed, 294 insertions(+), 0 deletions(-)
 create mode 100644 cache.c
 create mode 100644 include/qemu/cache.h

diff --git a/Makefile.objs b/Makefile.objs
index 70c5c79..8fed055 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -137,6 +137,7 @@ common-obj-y += qdev.o qdev-properties.o qdev-monitor.o
 common-obj-y += block-migration.o iohandler.o
 common-obj-y += pflib.o
 common-obj-y += bitmap.o bitops.o
+common-obj-y += cache.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/cache.c b/cache.c
new file mode 100644
index 0000000..c300fa6
--- /dev/null
+++ b/cache.c
@@ -0,0 +1,212 @@
+/*
+ * Page cache for qemu
+ * The cache is base on a hash on the page address
+ *
+ * Copyright 2011 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Orit Wasserman  <owasserm@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <strings.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <stdbool.h>
+#include <glib.h>
+
+#include "qemu/cache.h"
+
+#ifdef DEBUG_CACHE
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stdout, "cache: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+typedef struct CacheItem CacheItem;
+
+struct CacheItem {
+    uint64_t it_addr;
+    unsigned long it_age;
+    uint8_t *it_data;
+};
+
+struct Cache {
+    CacheItem *page_cache;
+    unsigned int page_size;
+    int64_t max_num_items;
+    uint64_t max_item_age;
+    int64_t num_items;
+};
+
+Cache *cache_init(int64_t num_pages, unsigned int page_size)
+{
+    int i;
+
+    Cache *cache = g_malloc(sizeof(Cache));
+    if (!cache) {
+        DPRINTF("Error allocation Cache\n");
+        return NULL;
+    }
+
+    if (num_pages <= 0) {
+        DPRINTF("invalid number pages\n");
+        return NULL;
+    }
+
+    cache->page_size = page_size;
+    cache->num_items = 0;
+    cache->max_item_age = 0;
+    cache->max_num_items = num_pages;
+
+    DPRINTF("Setting cache buckets to %lu\n", cache->max_num_items);
+
+    cache->page_cache = g_malloc((cache->max_num_items) *
+                                 sizeof(CacheItem));
+    if (!cache->page_cache) {
+        DPRINTF("could not allocate cache\n");
+        g_free(cache);
+        return NULL;
+    }
+
+    for (i = 0; i < cache->max_num_items; i++) {
+        cache->page_cache[i].it_data = NULL;
+        cache->page_cache[i].it_age = 0;
+        cache->page_cache[i].it_addr = -1;
+    }
+
+    return cache;
+}
+
+void cache_fini(Cache *cache)
+{
+    int i;
+
+    g_assert(cache);
+    g_assert(cache->page_cache);
+
+    for (i = 0; i < cache->max_num_items; i++) {
+        g_free(cache->page_cache[i].it_data);
+        cache->page_cache[i].it_data = 0;
+    }
+
+    g_free(cache->page_cache);
+    cache->page_cache = NULL;
+}
+
+static unsigned long cache_get_cache_pos(const Cache *cache, uint64_t address)
+{
+    unsigned long pos;
+
+    g_assert(cache->max_num_items);
+    pos = (address/cache->page_size) & (cache->max_num_items - 1);
+    return pos;
+}
+
+bool cache_is_cached(const Cache *cache, uint64_t addr)
+{
+    unsigned long pos;
+
+    g_assert(cache);
+    g_assert(cache->page_cache);
+
+    pos = cache_get_cache_pos(cache, addr);
+
+    return (cache->page_cache[pos].it_addr == addr);
+}
+
+static CacheItem *cache_get_by_addr(const Cache *cache, uint64_t addr)
+{
+    unsigned long pos;
+
+    g_assert(cache);
+    g_assert(cache->page_cache);
+
+    pos = cache_get_cache_pos(cache, addr);
+
+    return &cache->page_cache[pos];
+}
+
+uint8_t *get_cached_data(const Cache *cache, uint64_t addr)
+{
+    return cache_get_by_addr(cache, addr)->it_data;
+}
+
+void cache_insert(Cache *cache, unsigned long addr, uint8_t *pdata)
+{
+
+    CacheItem *it = NULL;
+
+    g_assert(cache);
+    g_assert(cache->page_cache);
+
+    /* actual update of entry */
+    it = cache_get_by_addr(cache, addr);
+
+    if (!it->it_data) {
+        cache->num_items++;
+    }
+
+    it->it_data = pdata;
+    it->it_age = ++cache->max_item_age;
+    it->it_addr = addr;
+}
+
+int cache_resize(Cache *cache, int64_t new_num_pages)
+{
+    Cache *new_cache;
+    int i;
+
+    CacheItem *old_it, *new_it;
+
+    g_assert(cache);
+
+    /* same size */
+    if (new_num_pages == cache->max_num_items) {
+        return 0;
+    }
+
+    /* cache was not inited */
+    if (cache->page_cache == NULL) {
+        return -1;
+    }
+
+    new_cache = cache_init(new_num_pages, cache->page_size);
+    if (!(new_cache)) {
+        DPRINTF("Error creating new cache\n");
+        return -1;
+    }
+
+    /* move all data from old cache */
+    for (i = 0; i < cache->max_num_items; i++) {
+        old_it = &cache->page_cache[i];
+        if (old_it->it_addr != -1) {
+            /* check for collision , if there  is keep the first value */
+            new_it = cache_get_by_addr(new_cache, old_it->it_addr);
+            if (new_it->it_data) {
+                g_free(old_it->it_data);
+            } else {
+                cache_insert(new_cache, old_it->it_addr, old_it->it_data);
+            }
+        }
+    }
+
+    cache->page_cache = new_cache->page_cache;
+    cache->max_num_items = new_cache->max_num_items;
+    cache->num_items = new_cache->num_items;
+
+    g_free(new_cache);
+
+    return 0;
+}
diff --git a/include/qemu/cache.h b/include/qemu/cache.h
new file mode 100644
index 0000000..16145e1
--- /dev/null
+++ b/include/qemu/cache.h
@@ -0,0 +1,81 @@
+/*
+ * Page cache for qemu
+ * The cache is base on a hash on the page address
+ *
+ * Copyright 2011 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Orit Wasserman  <owasserm@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#ifndef CACHE_H
+#define CACHE_H
+
+/* Page cache for storing guest pages */
+typedef struct Cache Cache;
+
+/**
+ * cache_init: Initialize the page cache
+ *
+ *
+ * Returns new allocated cache or NULL on error
+ *
+ * @cache pointer to the Cache struct
+ * @num_pages: cache maximal number of cached pages
+ * @page_size: cache page size
+ */
+Cache *cache_init(int64_t num_pages, unsigned int page_size);
+
+/**
+ * cache_fini: free all cache resources
+ * @cache pointer to the Cache struct
+ */
+void cache_fini(Cache *cache);
+
+/**
+ * cache_is_cached: Checks to see if the page is cached
+ *
+ * Returns %true if page is cached
+ *
+ * @cache pointer to the Cache struct
+ * @addr: page addr
+ */
+bool cache_is_cached(const Cache *cache, uint64_t addr);
+
+/**
+ * get_cached_data: Get the data cached for an addr
+ *
+ * Returns pointer to the data cached or NULL if not cached
+ *
+ * @cache pointer to the Cache struct
+ * @addr: page addr
+ */
+uint8_t *get_cached_data(const Cache *cache, uint64_t addr);
+
+/**
+ * cache_insert: insert the page into the cache. the previous value will be overwritten
+ *
+ * @cache pointer to the Cache struct
+ * @addr: page address
+ * @pdata: pointer to the page
+ */
+void cache_insert(Cache *cache, uint64_t addr, uint8_t *pdata);
+
+/**
+ * cache_resize: resize the page cache. In case of size reduction the extra pages
+ * will be freed
+ *
+ * Returns -1 on error
+ *
+ * @cache pointer to the Cache struct
+ * @num_pages: new page cache size (in pages)
+ */
+int cache_resize(Cache *cache, int64_t num_pages);
+
+#endif
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 5/9] Add uleb encoding/decoding functions
  2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (3 preceding siblings ...)
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 4/9] Add cache handling functions Orit Wasserman
@ 2012-05-16 11:59 ` Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 6/9] Add save_block_hdr function Orit Wasserman
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, stefanha, mdroth, blauwirbel, Orit Wasserman,
	avi, pbonzini

Implement Unsigned Little Endian Base 128.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 cutils.c      |   29 +++++++++++++++++++++++++++++
 qemu-common.h |    8 ++++++++
 2 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/cutils.c b/cutils.c
index af308cd..60fb7c8 100644
--- a/cutils.c
+++ b/cutils.c
@@ -549,3 +549,32 @@ int qemu_sendv(int sockfd, struct iovec *iov, int len, int iov_offset)
     return do_sendv_recvv(sockfd, iov, len, iov_offset, 1);
 }
 
+/*
+ * Implementation of  ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+ * Input is limited to 14-bit numbers
+ */
+int uleb128_encode_small(uint8_t *out, uint32_t n)
+{
+    g_assert(n <= 0x3fff);
+    if (n < 0x80) {
+        *out++ = n;
+        return 1;
+    } else {
+        *out++ = (n & 0x7f) | 0x80;
+        *out++ = n >> 7;
+        return 2;
+    }
+}
+
+int uleb128_decode_small(const uint8_t *in, uint32_t *n)
+{
+    if (!(*in & 0x80)) {
+        *n = *in++;
+        return 1;
+    } else {
+        *n = *in++ & 0x7f;
+        g_assert(!(*in & 0x80));
+        *n |= *in++ << 7;
+        return 2;
+    }
+}
diff --git a/qemu-common.h b/qemu-common.h
index 30c59c8..3d0f66f 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -407,4 +407,12 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 
 #include "module.h"
 
+/*
+ * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+ * Input is limited to 14-bit numbers
+ */
+
+int uleb128_encode_small(uint8_t *out, uint32_t n);
+int uleb128_decode_small(const uint8_t *in, uint32_t *n);
+
 #endif
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 6/9] Add save_block_hdr function
  2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (4 preceding siblings ...)
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 5/9] Add uleb encoding/decoding functions Orit Wasserman
@ 2012-05-16 11:59 ` Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 7/9] Add XBZRLE to ram_save_block and ram_save_live Orit Wasserman
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, Petter Svard, stefanha, mdroth, blauwirbel,
	Orit Wasserman, Benoit Hudzia, avi, pbonzini, Aidan Shribman

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 arch_init.c |   26 ++++++++++++++------------
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9a35aee..a334a2e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -127,6 +127,18 @@ static int is_dup_page(uint8_t *page)
     return 1;
 }
 
+static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
+        int cont, int flag)
+{
+        qemu_put_be64(f, offset | cont | flag);
+        if (!cont) {
+                qemu_put_byte(f, strlen(block->idstr));
+                qemu_put_buffer(f, (uint8_t *)block->idstr,
+                                strlen(block->idstr));
+        }
+
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -153,21 +165,11 @@ static int ram_save_block(QEMUFile *f)
             p = memory_region_get_ram_ptr(mr) + offset;
 
             if (is_dup_page(p)) {
-                qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
-                if (!cont) {
-                    qemu_put_byte(f, strlen(block->idstr));
-                    qemu_put_buffer(f, (uint8_t *)block->idstr,
-                                    strlen(block->idstr));
-                }
+                save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
                 qemu_put_byte(f, *p);
                 bytes_sent = 1;
             } else {
-                qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
-                if (!cont) {
-                    qemu_put_byte(f, strlen(block->idstr));
-                    qemu_put_buffer(f, (uint8_t *)block->idstr,
-                                    strlen(block->idstr));
-                }
+                save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
                 qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
                 bytes_sent = TARGET_PAGE_SIZE;
             }
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 7/9] Add XBZRLE to ram_save_block and ram_save_live
  2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (5 preceding siblings ...)
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 6/9] Add save_block_hdr function Orit Wasserman
@ 2012-05-16 11:59 ` Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command Orit Wasserman
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 9/9] Add XBZRLE statistics Orit Wasserman
  8 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, Petter Svard, stefanha, mdroth, blauwirbel,
	Orit Wasserman, Benoit Hudzia, avi, pbonzini, Aidan Shribman

In the outgoing migration check to see if the page is cached and
changed than send compressed page by using save_xbrle_page function.
In the incoming migration check to see if RAM_SAVE_FLAG_XBRLE is set
and decompress the page (by using load_xbrle function).

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 arch_init.c |  220 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 migration.c |   26 +++++++-
 migration.h |    8 ++
 savevm.c    |   91 ++++++++++++++++++++++++
 4 files changed, 329 insertions(+), 16 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a334a2e..7ebdb7a 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -43,6 +43,15 @@
 #include "hw/smbios.h"
 #include "exec-memory.h"
 #include "hw/pcspk.h"
+#include "qemu/cache.h"
+
+#ifdef DEBUG_ARCH_INIT
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
 
 #ifdef TARGET_SPARC
 int graphic_width = 1024;
@@ -95,6 +104,7 @@ const uint32_t arch_type = QEMU_ARCH;
 #define RAM_SAVE_FLAG_PAGE     0x08
 #define RAM_SAVE_FLAG_EOS      0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
+#define RAM_SAVE_FLAG_XBZRLE   0x40
 
 #ifdef __ALTIVEC__
 #include <altivec.h>
@@ -127,6 +137,22 @@ static int is_dup_page(uint8_t *page)
     return 1;
 }
 
+/* XBZRLE (Xor Based Zero Length Encoding */
+typedef struct XBZRLEHeader {
+    uint32_t xh_cksum;
+    uint16_t xh_len;
+    uint8_t xh_flags;
+} XBZRLEHeader;
+
+/* struct contains XBZRLE cache and a static page
+   used by the compression */
+static struct {
+    /* buffer used for XBZRLE encoding */
+    uint8_t *encoded_buf;
+    /* Cache for XBZRLE */
+    Cache *cache;
+} XBZRLE = {0};
+
 static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
         int cont, int flag)
 {
@@ -139,19 +165,78 @@ static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
 
 }
 
+#define ENCODING_FLAG_XBZRLE 0x1
+
+static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
+                            ram_addr_t current_addr, RAMBlock *block,
+                            ram_addr_t offset, int cont)
+{
+    int encoded_len = 0, bytes_sent = -1, ret = -1;
+    XBZRLEHeader hdr = {0};
+    uint8_t *prev_cached_page;
+
+    /* check to see if page is cached , if not cache and return */
+    if (!cache_is_cached(XBZRLE.cache, current_addr)) {
+        cache_insert(XBZRLE.cache, current_addr, g_memdup(current_data,
+                                                          TARGET_PAGE_SIZE));
+        goto done;
+    }
+
+    prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
+
+    /* XBZRLE encoding (if there is no overflow) */
+    encoded_len = xbzrle_encode_buffer(prev_cached_page, current_data,
+                                       TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
+                                       TARGET_PAGE_SIZE);
+    if (encoded_len == 0) {
+        bytes_sent = 0;
+        DPRINTF("Unmodifed page or overflow skipping\n");
+        goto done;
+    } else if (encoded_len == -1) {
+        bytes_sent = -1;
+        DPRINTF("Overflow\n");
+        /* update data in the cache */
+        memcpy(prev_cached_page, current_data, TARGET_PAGE_SIZE);
+        goto done;
+    }
+
+    /* we need to update the data in the cache, in order to get the same data
+       we cached we decode the encoded page on the cached data */
+    ret = xbzrle_decode_buffer(XBZRLE.encoded_buf, encoded_len,
+                               prev_cached_page, TARGET_PAGE_SIZE);
+    g_assert(ret != -1);
+
+    hdr.xh_len = encoded_len;
+    hdr.xh_flags |= ENCODING_FLAG_XBZRLE;
+
+    /* Send XBZRLE based compressed page */
+    save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_XBZRLE);
+    qemu_put_byte(f, hdr.xh_flags);
+    qemu_put_be16(f, hdr.xh_len);
+    qemu_put_be32(f, hdr.xh_cksum);
+    qemu_put_buffer(f, XBZRLE.encoded_buf, encoded_len);
+    bytes_sent = encoded_len + sizeof(hdr);
+
+done:
+    return bytes_sent;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
-static int ram_save_block(QEMUFile *f)
+static int ram_save_block(QEMUFile *f, int stage)
 {
     RAMBlock *block = last_block;
     ram_addr_t offset = last_offset;
-    int bytes_sent = 0;
+    int bytes_sent = -1;
     MemoryRegion *mr;
+    ram_addr_t current_addr;
 
     if (!block)
         block = QLIST_FIRST(&ram_list.blocks);
 
+    current_addr = block->offset + offset;
+
     do {
         mr = block->mr;
         if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
@@ -168,7 +253,22 @@ static int ram_save_block(QEMUFile *f)
                 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
                 qemu_put_byte(f, *p);
                 bytes_sent = 1;
-            } else {
+            } else if (migrate_use_xbzrle()) {
+                /* in stage 1 none of the pages are cached so we just want to
+                   cache them for next stages, and send the cached copy */
+                if (stage == 1) {
+                    cache_insert(XBZRLE.cache, current_addr,
+                                 g_memdup(p, TARGET_PAGE_SIZE));
+                } else {
+                    bytes_sent = save_xbzrle_page(f, p, current_addr, block,
+                                                  offset, cont);
+                }
+                /* always send the cached page copy */
+                p = get_cached_data(XBZRLE.cache, current_addr);
+            }
+
+            /* either we didn't send yet (we may got XBZRLE overflow) */
+            if (bytes_sent == -1) {
                 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
                 qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
                 bytes_sent = TARGET_PAGE_SIZE;
@@ -262,6 +362,17 @@ static void sort_ram_list(void)
     g_free(blocks);
 }
 
+static void migration_end(void)
+{
+    memory_global_dirty_log_stop();
+
+    if (migrate_use_xbzrle()) {
+        cache_fini(XBZRLE.cache);
+        g_free(XBZRLE.cache);
+        XBZRLE.cache = NULL;
+    }
+}
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
     ram_addr_t addr;
@@ -271,7 +382,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     int ret;
 
     if (stage < 0) {
-        memory_global_dirty_log_stop();
+        migration_end();
         return 0;
     }
 
@@ -284,6 +395,16 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
         last_offset = 0;
         sort_ram_list();
 
+        if (migrate_use_xbzrle()) {
+            XBZRLE.cache = cache_init(migrate_xbzrle_cache_size(),
+                                      TARGET_PAGE_SIZE);
+            if (!XBZRLE.cache) {
+                DPRINTF("Error creating cache\n");
+                return -1;
+            }
+            XBZRLE.encoded_buf = g_malloc0(TARGET_PAGE_SIZE);
+        }
+
         /* Make sure all dirty bits are set */
         QLIST_FOREACH(block, &ram_list.blocks, next) {
             for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
@@ -311,9 +432,12 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     while ((ret = qemu_file_rate_limit(f)) == 0) {
         int bytes_sent;
 
-        bytes_sent = ram_save_block(f);
-        bytes_transferred += bytes_sent;
-        if (bytes_sent == 0) { /* no more blocks */
+        bytes_sent = ram_save_block(f, stage);
+        /* bytes_sent 0 represent unchanged page,
+           bytes_sent -1 represent no more blocks*/
+        if (bytes_sent > 0) {
+            bytes_transferred += bytes_sent;
+        } else if (bytes_sent == -1) { /* no more blocks */
             break;
         }
     }
@@ -336,19 +460,62 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
         int bytes_sent;
 
         /* flush all remaining blocks regardless of rate limiting */
-        while ((bytes_sent = ram_save_block(f)) != 0) {
+        while ((bytes_sent = ram_save_block(f, stage)) != -1) {
             bytes_transferred += bytes_sent;
         }
-        memory_global_dirty_log_stop();
+        migration_end();
     }
 
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
     expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
 
+    DPRINTF("ram_save_live: expected(%ld) <= max(%ld)?\n", expected_time,
+        migrate_max_downtime());
+
     return (stage == 2) && (expected_time <= migrate_max_downtime());
 }
 
+static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
+{
+    int ret, rc = 0;
+    uint8_t *xbzrle_buf = NULL;
+    XBZRLEHeader hdr = {0};
+
+    /* extract RLE header */
+    hdr.xh_flags = qemu_get_byte(f);
+    hdr.xh_len = qemu_get_be16(f);
+    hdr.xh_cksum = qemu_get_be32(f);
+
+    if (!(hdr.xh_flags & ENCODING_FLAG_XBZRLE)) {
+        fprintf(stderr, "Failed to load XBZRLE page - wrong compression!\n");
+        return -1;
+    }
+
+    if (hdr.xh_len > TARGET_PAGE_SIZE) {
+        fprintf(stderr, "Failed to load XBZRLE page - len overflow!\n");
+        return -1;
+    }
+
+    /* load data and decode */
+    xbzrle_buf = g_malloc0(TARGET_PAGE_SIZE);
+    qemu_get_buffer(f, xbzrle_buf, hdr.xh_len);
+
+    /* decode RLE */
+    ret = xbzrle_decode_buffer(xbzrle_buf, hdr.xh_len, host, TARGET_PAGE_SIZE);
+    if (ret == -1) {
+        fprintf(stderr, "Failed to load XBZRLE page - decode error!\n");
+        rc = -1;
+    } else  if (ret > TARGET_PAGE_SIZE) {
+        fprintf(stderr, "Failed to load XBZRLE page - size %d exceeds %d!\n",
+                ret, TARGET_PAGE_SIZE);
+        rc = -1;
+    }
+
+    g_free(xbzrle_buf);
+    return rc;
+}
+
 static inline void *host_from_stream_offset(QEMUFile *f,
                                             ram_addr_t offset,
                                             int flags)
@@ -382,8 +549,11 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     ram_addr_t addr;
-    int flags;
+    int flags, ret = 0;
     int error;
+    static uint64_t seq_iter;
+
+    seq_iter++;
 
     if (version_id < 4 || version_id > 4) {
         return -EINVAL;
@@ -413,8 +583,10 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 
                     QLIST_FOREACH(block, &ram_list.blocks, next) {
                         if (!strncmp(id, block->idstr, sizeof(id))) {
-                            if (block->length != length)
-                                return -EINVAL;
+                            if (block->length != length) {
+                                ret =  -EINVAL;
+                                goto done;
+                            }
                             break;
                         }
                     }
@@ -422,7 +594,8 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
                     if (!block) {
                         fprintf(stderr, "Unknown ramblock \"%s\", cannot "
                                 "accept migration\n", id);
-                        return -EINVAL;
+                        ret = -EINVAL;
+                        goto done;
                     }
 
                     total_ram_bytes -= length;
@@ -451,16 +624,33 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
             void *host;
 
             host = host_from_stream_offset(f, addr, flags);
+            if (!host) {
+                return -EINVAL;
+            }
 
             qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+        } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
+            void *host = host_from_stream_offset(f, addr, flags);
+            if (!host) {
+                return -EINVAL;
+            }
+
+            if (load_xbzrle(f, addr, host) < 0) {
+                ret = -EINVAL;
+                goto done;
+            }
         }
         error = qemu_file_get_error(f);
         if (error) {
-            return error;
+            ret = error;
+            goto done;
         }
     } while (!(flags & RAM_SAVE_FLAG_EOS));
 
-    return 0;
+done:
+    DPRINTF("Completed load of VM with exit code %d seq iteration %ld\n",
+            ret, seq_iter);
+    return ret;
 }
 
 #ifdef HAS_AUDIO
diff --git a/migration.c b/migration.c
index 66e71a3..ba11adb 100644
--- a/migration.c
+++ b/migration.c
@@ -43,6 +43,9 @@ enum {
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
+/* Migration XBZRLE cache size */
+#define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
+
 static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 
@@ -55,7 +58,8 @@ static MigrationState *migrate_get_current(void)
     static MigrationState current_migration = {
         .state = MIG_STATE_SETUP,
         .bandwidth_limit = MAX_THROTTLE,
-    };
+        .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE,
+};
 
     return &current_migration;
 }
@@ -410,6 +414,7 @@ static MigrationState *migrate_init(const MigrationParams *params)
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
     bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
+    int64_t xbzrle_cache_size = s->xbzrle_cache_size;
 
     memcpy(enabled_capabilities, s->enabled_capabilities,
            sizeof(enabled_capabilities));
@@ -419,6 +424,7 @@ static MigrationState *migrate_init(const MigrationParams *params)
     s->params = *params;
     memcpy(s->enabled_capabilities, enabled_capabilities,
            sizeof(enabled_capabilities));
+    s->xbzrle_cache_size = xbzrle_cache_size;
 
     s->state = MIG_STATE_SETUP;
 
@@ -514,3 +520,21 @@ void qmp_migrate_set_downtime(double value, Error **errp)
     value = MAX(0, MIN(UINT64_MAX, value));
     max_downtime = (uint64_t)value;
 }
+
+int migrate_use_xbzrle(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE];
+}
+
+int64_t migrate_xbzrle_cache_size(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->xbzrle_cache_size;
+}
diff --git a/migration.h b/migration.h
index e42b0a0..175c729 100644
--- a/migration.h
+++ b/migration.h
@@ -39,6 +39,7 @@ struct MigrationState
     void *opaque;
     MigrationParams params;
     bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
+    int64_t xbzrle_cache_size;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -98,4 +99,11 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
+                         uint8_t *dst, int dlen);
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
+
+int migrate_use_xbzrle(void);
+int64_t migrate_xbzrle_cache_size(void);
+
 #endif
diff --git a/savevm.c b/savevm.c
index 42937a0..31db838 100644
--- a/savevm.c
+++ b/savevm.c
@@ -2374,3 +2374,94 @@ void vmstate_register_ram_global(MemoryRegion *mr)
 {
     vmstate_register_ram(mr, NULL);
 }
+
+/*
+  page = zrun nzrun
+       | zrun nzrun page
+
+  zrun = length
+
+  nzrun = length byte...
+
+  length = uleb128 encoded integer
+ */
+int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
+                         uint8_t *dst, int dlen)
+{
+    uint32_t zrun_len = 0, nzrun_len = 0;
+    int d = 0 , i = 0;
+    uint8_t *nzrun_start = NULL;
+
+    while (i < slen) {
+        /* overflow */
+        if (d + 2 > dlen) {
+            return -1;
+        }
+
+        while (!(old_buf[i] ^ new_buf[i]) && ++i <= slen) {
+            zrun_len++;
+        }
+
+        /* buffer unchanged */
+        if (zrun_len == slen) {
+            return 0;
+        }
+
+        /* skip last zero run */
+        if (i == slen + 1) {
+            return d;
+        }
+
+        d += uleb128_encode_small(dst + d, zrun_len);
+
+        zrun_len = 0;
+        nzrun_start = new_buf + i;
+        while ((old_buf[i] ^ new_buf[i]) != 0 && ++i <= slen) {
+            nzrun_len++;
+        }
+
+        /* overflow */
+        if (d + nzrun_len + 2 > dlen) {
+            return -1;
+        }
+
+        d += uleb128_encode_small(dst + d, nzrun_len);
+        memcpy(dst + d, nzrun_start, nzrun_len);
+        d += nzrun_len;
+        nzrun_len = 0;
+    }
+
+    return d;
+}
+
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen)
+{
+    int i = 0, d = 0;
+    uint32_t count = 0;
+
+    while (i < slen) {
+
+        /* zrun */
+        i += uleb128_decode_small(src + i, &count);
+        d += count;
+
+        /* overflow */
+        g_assert(d <= dlen);
+
+        /* completed decoding */
+        if (i == slen - 1) {
+            return d;
+        }
+
+        /* nzrun */
+        i += uleb128_decode_small(src + i, &count);
+
+        g_assert(d + count <= dlen);
+
+        memcpy(dst + d , src + i, count);
+        d += count;
+        i += count;
+    }
+
+    return d;
+}
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command
  2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (6 preceding siblings ...)
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 7/9] Add XBZRLE to ram_save_block and ram_save_live Orit Wasserman
@ 2012-05-16 11:59 ` Orit Wasserman
  2012-05-16 16:45   ` Eric Blake
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 9/9] Add XBZRLE statistics Orit Wasserman
  8 siblings, 1 reply; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, Petter Svard, stefanha, mdroth, blauwirbel,
	Orit Wasserman, Benoit Hudzia, avi, pbonzini, Aidan Shribman

Change XBZRLE cache size in bytes (the size should be a power of 2).
If XBZRLE cache size is too small there will be many cache miss.

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 arch_init.c      |    7 +++++++
 hmp-commands.hx  |   15 +++++++++++++++
 hmp.c            |   13 +++++++++++++
 hmp.h            |    1 +
 migration.c      |   32 +++++++++++++++++++++++++++++++-
 migration.h      |    2 ++
 qapi-schema.json |   13 +++++++++++++
 qemu-common.h    |    5 +++++
 qmp-commands.hx  |   22 ++++++++++++++++++++++
 9 files changed, 109 insertions(+), 1 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 7ebdb7a..851e45d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -24,6 +24,7 @@
 #include <stdint.h>
 #include <stdarg.h>
 #include <stdlib.h>
+#include <math.h>
 #ifndef _WIN32
 #include <sys/types.h>
 #include <sys/mman.h>
@@ -153,6 +154,12 @@ static struct {
     Cache *cache;
 } XBZRLE = {0};
 
+
+void xbzrle_cache_resize(int64_t order)
+{
+    cache_resize(XBZRLE.cache, pow(2, order));
+}
+
 static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
         int cont, int flag)
 {
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e14e7be..abc9403 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -829,6 +829,21 @@ STEXI
 @item migrate_cancel
 @findex migrate_cancel
 Cancel the current VM migration.
+
+ETEXI
+
+    {
+        .name       = "migrate_set_cachesize",
+        .args_type  = "value:o",
+        .params     = "value",
+        .help       = "set cache size (in bytes) for XBZRLE migrations. The cache size effects the number of cache misses. In case of a high cache miss ratio you need to increase the cache size",
+        .mhandler.cmd = hmp_migrate_set_cachesize,
+    },
+
+STEXI
+@item migrate_set_cachesize @var{value}
+@findex migrate_set_cache
+Set cache size to @var{value} (in bytes) for xbzrle migrations.
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index e73132b..0e4d63a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -751,6 +751,19 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict)
     qmp_migrate_set_downtime(value, NULL);
 }
 
+void hmp_migrate_set_cachesize(Monitor *mon, const QDict *qdict)
+{
+    int64_t value = qdict_get_int(qdict, "value");
+    Error *err = NULL;
+
+    qmp_migrate_set_cachesize(value, &err);
+    if (err) {
+        monitor_printf(mon, "%s\n", error_get_pretty(err));
+        error_free(err);
+        return;
+    }
+}
+
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict)
 {
     int64_t value = qdict_get_int(qdict, "value");
diff --git a/hmp.h b/hmp.h
index 5f9d842..9559559 100644
--- a/hmp.h
+++ b/hmp.h
@@ -53,6 +53,7 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
+void hmp_migrate_set_cachesize(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/migration.c b/migration.c
index ba11adb..4fb3b8a 100644
--- a/migration.c
+++ b/migration.c
@@ -22,6 +22,7 @@
 #include "qemu_socket.h"
 #include "block-migration.h"
 #include "qmp-commands.h"
+#include <math.h>
 
 //#define DEBUG_MIGRATION
 
@@ -43,7 +44,7 @@ enum {
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
-/* Migration XBZRLE cache size */
+/* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
 
 static NotifierList migration_state_notifiers =
@@ -501,6 +502,35 @@ void qmp_migrate_cancel(Error **errp)
     migrate_fd_cancel(migrate_get_current());
 }
 
+void qmp_migrate_set_cachesize(int64_t value, Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+
+    /* Check for truncation */
+    if (value != (size_t)value) {
+        error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
+                  "exceeding address space");
+        return;
+    }
+
+    value = MIN(UINT64_MAX, value);
+
+    /* no change */
+    if (value == s->xbzrle_cache_size) {
+        return;
+    }
+
+    /* power of 2 */
+    if (value != 1 && !is_power_of_2(value)) {
+        error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
+                  "needs to be power of 2");
+        return;
+    }
+
+    s->xbzrle_cache_size = value;
+    xbzrle_cache_resize(log2(value));
+}
+
 void qmp_migrate_set_speed(int64_t value, Error **errp)
 {
     MigrationState *s;
diff --git a/migration.h b/migration.h
index 175c729..6a5bc0e 100644
--- a/migration.h
+++ b/migration.h
@@ -106,4 +106,6 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
 
+void xbzrle_cache_resize(int64_t new_size);
+
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 2887c51..4d30552 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1357,6 +1357,19 @@
 { 'command': 'migrate_set_speed', 'data': {'value': 'int'} }
 
 ##
+# @migrate_set_cachesize
+#
+# Set XBZRLE cache size
+#
+# @value: cache size in bytes
+#
+# Returns: nothing on success
+#
+# Since: 1.1
+##
+{ 'command': 'migrate_set_cachesize', 'data': {'value': 'int'} }
+
+##
 # @ObjectPropertyInfo:
 #
 # @name: the name of the property
diff --git a/qemu-common.h b/qemu-common.h
index 3d0f66f..da44c17 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -415,4 +415,9 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 int uleb128_encode_small(uint8_t *out, uint32_t n);
 int uleb128_decode_small(const uint8_t *in, uint32_t *n);
 
+static inline bool is_power_of_2(int64_t value)
+{
+    return !(value & (value - 1));
+}
+
 #endif
diff --git a/qmp-commands.hx b/qmp-commands.hx
index f276e08..3bdbaae 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -520,6 +520,28 @@ Example:
 <- { "return": {} }
 
 EQMP
+{
+        .name       = "migrate_set_cachesize",
+        .args_type  = "value:o",
+        .mhandler.cmd_new = qmp_marshal_input_migrate_set_cachesize,
+    },
+
+SQMP
+migrate_set_cachesize
+---------------------
+
+Set cache size to be used by XBZRLE migration
+
+Arguments:
+
+- "value": cache size in bytes (json-int)
+
+Example:
+
+-> { "execute": "migrate_set_cachesize", "arguments": { "value": 512 } }
+<- { "return": {} }
+
+EQMP
 
     {
         .name       = "migrate_set_speed",
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command Orit Wasserman
@ 2012-05-16 16:45   ` Eric Blake
  2012-05-16 17:04     ` Orit Wasserman
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2012-05-16 16:45 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: aliguori, quintela, stefanha, mdroth, qemu-devel, blauwirbel,
	Petter Svard, Benoit Hudzia, avi, pbonzini, Aidan Shribman

[-- Attachment #1: Type: text/plain, Size: 2833 bytes --]

On 05/16/2012 05:59 AM, Orit Wasserman wrote:
> Change XBZRLE cache size in bytes (the size should be a power of 2).
> If XBZRLE cache size is too small there will be many cache miss.
> 
> Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
> Signed-off-by: Petter Svard <petters@cs.umu.se>
> Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

>  
> +
> +void xbzrle_cache_resize(int64_t order)
> +{
> +    cache_resize(XBZRLE.cache, pow(2, order));

'1 << order' is much more efficient than a call to pow().

> +void qmp_migrate_set_cachesize(int64_t value, Error **errp)

> +
> +    /* power of 2 */
> +    if (value != 1 && !is_power_of_2(value)) {
> +        error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
> +                  "needs to be power of 2");

We already have QERR_PROPERTY_VALUE_NOT_POWER_OF_2, why aren't you using
that here?

> +        return;
> +    }
> +
> +    s->xbzrle_cache_size = value;
> +    xbzrle_cache_resize(log2(value));

log2() is rather expensive, ffs() from <strings.h> is more efficient at
converting a single bit into the appropriate order.

>  ##
> +# @migrate_set_cachesize
> +#
> +# Set XBZRLE cache size
> +#
> +# @value: cache size in bytes
> +#
> +# Returns: nothing on success

Document the error for a non-power-of-2 or for overflow.

Document whether this command is safe for an ongoing migration, or
whether it must be called in advance of a migration.

> +#
> +# Since: 1.1

1.2.

> +static inline bool is_power_of_2(int64_t value)
> +{
> +    return !(value & (value - 1));
> +}

This says '0' is a power of 2, which is not true.  Either fix the logic
to exclude 0, or fix the function name to state that you are really
checking that at most one bit is set.

Also, if value is 0x8000000000000000, you are triggering unspecified
behavior per C99.  Is it worth using uint64_t for defined behavior, or
do you need to take precautions regarding negative values?

> +SQMP
> +migrate_set_cachesize
> +---------------------
> +
> +Set cache size to be used by XBZRLE migration
> +
> +Arguments:
> +
> +- "value": cache size in bytes (json-int)

Would it be any easier to take 'order' (log2 of the size) instead of the
actual cache size?  That is, instead of calling "value":1048576, I would
rather type "value":20.

> +
> +Example:
> +
> +-> { "execute": "migrate_set_cachesize", "arguments": { "value": 512 } }

Isn't 512 bytes rather small?  And given my argument about taking order
rather than bytes as being easier to use, don't you really mean 512
megabytes (order 29) rather than 512 bytes (order 9)?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command
  2012-05-16 16:45   ` Eric Blake
@ 2012-05-16 17:04     ` Orit Wasserman
  2012-05-16 17:58       ` Eric Blake
  0 siblings, 1 reply; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 17:04 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, quintela, stefanha, mdroth, qemu-devel, blauwirbel,
	Petter Svard, Benoit Hudzia, avi, pbonzini, Aidan Shribman

On 05/16/2012 07:45 PM, Eric Blake wrote:
> On 05/16/2012 05:59 AM, Orit Wasserman wrote:
>> Change XBZRLE cache size in bytes (the size should be a power of 2).
>> If XBZRLE cache size is too small there will be many cache miss.
>>
>> Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
>> Signed-off-by: Petter Svard <petters@cs.umu.se>
>> Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> 
>>  
>> +
>> +void xbzrle_cache_resize(int64_t order)
>> +{
>> +    cache_resize(XBZRLE.cache, pow(2, order));
> 
> '1 << order' is much more efficient than a call to pow().
ok
> 
> 
>> +void qmp_migrate_set_cachesize(int64_t value, Error **errp)
> 
>> +
>> +    /* power of 2 */
>> +    if (value != 1 && !is_power_of_2(value)) {
>> +        error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
>> +                  "needs to be power of 2");
> 
> We already have QERR_PROPERTY_VALUE_NOT_POWER_OF_2, why aren't you using
> that here?
I will update it.
> 
>> +        return;
>> +    }
>> +
>> +    s->xbzrle_cache_size = value;
>> +    xbzrle_cache_resize(log2(value));
> 
> log2() is rather expensive, ffs() from <strings.h> is more efficient at
> converting a single bit into the appropriate order.
ok
> 
> 
>>  ##
>> +# @migrate_set_cachesize
>> +#
>> +# Set XBZRLE cache size
>> +#
>> +# @value: cache size in bytes
>> +#
>> +# Returns: nothing on success
> 
> Document the error for a non-power-of-2 or for overflow.
> 
> Document whether this command is safe for an ongoing migration, or
> whether it must be called in advance of a migration.
sure
> 
>> +#
>> +# Since: 1.1
> 
> 1.2.
> 
> 
>> +static inline bool is_power_of_2(int64_t value)
>> +{
>> +    return !(value & (value - 1));
>> +}
> 
> This says '0' is a power of 2, which is not true.  Either fix the logic
> to exclude 0, or fix the function name to state that you are really
> checking that at most one bit is set.
> 
> Also, if value is 0x8000000000000000, you are triggering unspecified
> behavior per C99.  Is it worth using uint64_t for defined behavior, or
> do you need to take precautions regarding negative values?
The input is int64 so I prefer to keep it this way.
The calling function does the check for 0 , negative numbers and overflow
but I can add those checks here too.

> 
> 
>> +SQMP
>> +migrate_set_cachesize
>> +---------------------
>> +
>> +Set cache size to be used by XBZRLE migration
>> +
>> +Arguments:
>> +
>> +- "value": cache size in bytes (json-int)
> 
> Would it be any easier to take 'order' (log2 of the size) instead of the
> actual cache size?  That is, instead of calling "value":1048576, I would
> rather type "value":20.
Well the user is considering how much memory is going to be used and I though that it
is simpler to use 1G than 30.
But I guess the user is libvirt so it can be changed to order.

> 
>> +
>> +Example:
>> +
>> +-> { "execute": "migrate_set_cachesize", "arguments": { "value": 512 } }
> 
> Isn't 512 bytes rather small?  And given my argument about taking order
> rather than bytes as being easier to use, don't you really mean 512
> megabytes (order 29) rather than 512 bytes (order 9)?
> 
correct 512M not bytes ...

Orit

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command
  2012-05-16 17:04     ` Orit Wasserman
@ 2012-05-16 17:58       ` Eric Blake
  2012-05-17  9:46         ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2012-05-16 17:58 UTC (permalink / raw)
  To: Orit Wasserman
  Cc: aliguori, quintela, stefanha, qemu-devel, mdroth, blauwirbel,
	Petter Svard, Benoit Hudzia, avi, pbonzini, Aidan Shribman

[-- Attachment #1: Type: text/plain, Size: 642 bytes --]

On 05/16/2012 11:04 AM, Orit Wasserman wrote:

>>> +- "value": cache size in bytes (json-int)
>>
>> Would it be any easier to take 'order' (log2 of the size) instead of the
>> actual cache size?  That is, instead of calling "value":1048576, I would
>> rather type "value":20.
> Well the user is considering how much memory is going to be used and I though that it
> is simpler to use 1G than 30.

Libvirt can cope with either style, so maybe it's worth waiting for
anyone else to chime in on which style is easier.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command
  2012-05-16 17:58       ` Eric Blake
@ 2012-05-17  9:46         ` Avi Kivity
  2012-05-17 12:25           ` Orit Wasserman
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2012-05-17  9:46 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, quintela, Petter Svard, stefanha, qemu-devel, mdroth,
	blauwirbel, Orit Wasserman, Benoit Hudzia, pbonzini,
	Aidan Shribman

On 05/16/2012 08:58 PM, Eric Blake wrote:
> On 05/16/2012 11:04 AM, Orit Wasserman wrote:
>
> >>> +- "value": cache size in bytes (json-int)
> >>
> >> Would it be any easier to take 'order' (log2 of the size) instead of the
> >> actual cache size?  That is, instead of calling "value":1048576, I would
> >> rather type "value":20.
> > Well the user is considering how much memory is going to be used and I though that it
> > is simpler to use 1G than 30.
>
> Libvirt can cope with either style, so maybe it's worth waiting for
> anyone else to chime in on which style is easier.

Let's be consistent.  It's best to use bytes everywhere (not kilobytes,
not megabytes, not pages, not order, or anything else we can come up with).

If you really want to specify order (not that I can think of a reason
why), we can use a suffix: 20ORD == 1M == 1048576.

btw, maybe it's better to handle a non-power-of-two cache size by
rounding down.  Less errors, less puzzlement, and less memory used.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command
  2012-05-17  9:46         ` Avi Kivity
@ 2012-05-17 12:25           ` Orit Wasserman
  0 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-17 12:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: aliguori, quintela, stefanha, qemu-devel, mdroth, blauwirbel,
	Petter Svard, Benoit Hudzia, Aidan Shribman, pbonzini, Eric Blake

On 05/17/2012 12:46 PM, Avi Kivity wrote:
> On 05/16/2012 08:58 PM, Eric Blake wrote:
>> On 05/16/2012 11:04 AM, Orit Wasserman wrote:
>>
>>>>> +- "value": cache size in bytes (json-int)
>>>>
>>>> Would it be any easier to take 'order' (log2 of the size) instead of the
>>>> actual cache size?  That is, instead of calling "value":1048576, I would
>>>> rather type "value":20.
>>> Well the user is considering how much memory is going to be used and I though that it
>>> is simpler to use 1G than 30.
>>
>> Libvirt can cope with either style, so maybe it's worth waiting for
>> anyone else to chime in on which style is easier.
> 
> Let's be consistent.  It's best to use bytes everywhere (not kilobytes,
> not megabytes, not pages, not order, or anything else we can come up with).
> 
> If you really want to specify order (not that I can think of a reason
> why), we can use a suffix: 20ORD == 1M == 1048576.
That is what used at the moment.
> 
> btw, maybe it's better to handle a non-power-of-two cache size by
> rounding down.  Less errors, less puzzlement, and less memory used.
Sounds good to me.

Orit
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 9/9] Add XBZRLE statistics
  2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
                   ` (7 preceding siblings ...)
  2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command Orit Wasserman
@ 2012-05-16 11:59 ` Orit Wasserman
  8 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, Petter Svard, stefanha, mdroth, blauwirbel,
	Orit Wasserman, Benoit Hudzia, avi, pbonzini, Aidan Shribman

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 arch_init.c      |   68 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 hmp.c            |   13 ++++++++++
 migration.c      |   12 +++++++++
 migration.h      |    9 +++++++
 qapi-schema.json |   27 +++++++++++++++++++--
 qmp-commands.hx  |   28 ++++++++++++++++++++++
 6 files changed, 153 insertions(+), 4 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 851e45d..1c35b26 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -160,8 +160,66 @@ void xbzrle_cache_resize(int64_t order)
     cache_resize(XBZRLE.cache, pow(2, order));
 }
 
+/* accounting */
+typedef struct AccountingInfo {
+    uint64_t dup_pages;
+    uint64_t norm_pages;
+    uint64_t xbzrle_bytes;
+    uint64_t xbzrle_pages;
+    uint64_t xbzrle_cache_miss;
+    uint64_t iterations;
+    uint64_t xbzrle_overflows;
+} AccountingInfo;
+
+static AccountingInfo acct_info;
+
+static void acct_clear(void)
+{
+    memset(&acct_info, 0, sizeof(acct_info));
+}
+
+uint64_t dup_mig_bytes_transferred(void)
+{
+    return acct_info.dup_pages * TARGET_PAGE_SIZE;
+}
+
+uint64_t dup_mig_pages_transferred(void)
+{
+    return acct_info.dup_pages;
+}
+
+uint64_t norm_mig_bytes_transferred(void)
+{
+    return acct_info.norm_pages * TARGET_PAGE_SIZE;
+}
+
+uint64_t norm_mig_pages_transferred(void)
+{
+    return acct_info.norm_pages;
+}
+
+uint64_t xbzrle_mig_bytes_transferred(void)
+{
+    return acct_info.xbzrle_bytes;
+}
+
+uint64_t xbzrle_mig_pages_transferred(void)
+{
+    return acct_info.xbzrle_pages;
+}
+
+uint64_t xbzrle_mig_pages_cache_miss(void)
+{
+    return acct_info.xbzrle_cache_miss;
+}
+
+uint64_t xbzrle_mig_pages_overflow(void)
+{
+    return acct_info.xbzrle_overflows;
+}
+
 static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
-        int cont, int flag)
+                           int cont, int flag)
 {
         qemu_put_be64(f, offset | cont | flag);
         if (!cont) {
@@ -186,6 +244,7 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
     if (!cache_is_cached(XBZRLE.cache, current_addr)) {
         cache_insert(XBZRLE.cache, current_addr, g_memdup(current_data,
                                                           TARGET_PAGE_SIZE));
+        acct_info.xbzrle_cache_miss++;
         goto done;
     }
 
@@ -202,6 +261,7 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
     } else if (encoded_len == -1) {
         bytes_sent = -1;
         DPRINTF("Overflow\n");
+        acct_info.xbzrle_overflows++;
         /* update data in the cache */
         memcpy(prev_cached_page, current_data, TARGET_PAGE_SIZE);
         goto done;
@@ -222,7 +282,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
     qemu_put_be16(f, hdr.xh_len);
     qemu_put_be32(f, hdr.xh_cksum);
     qemu_put_buffer(f, XBZRLE.encoded_buf, encoded_len);
+    acct_info.xbzrle_pages++;
     bytes_sent = encoded_len + sizeof(hdr);
+    acct_info.xbzrle_bytes += bytes_sent;
 
 done:
     return bytes_sent;
@@ -257,6 +319,7 @@ static int ram_save_block(QEMUFile *f, int stage)
             p = memory_region_get_ram_ptr(mr) + offset;
 
             if (is_dup_page(p)) {
+                acct_info.dup_pages++;
                 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
                 qemu_put_byte(f, *p);
                 bytes_sent = 1;
@@ -279,6 +342,7 @@ static int ram_save_block(QEMUFile *f, int stage)
                 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
                 qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
                 bytes_sent = TARGET_PAGE_SIZE;
+                acct_info.norm_pages++;
             }
 
             break;
@@ -410,6 +474,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
                 return -1;
             }
             XBZRLE.encoded_buf = g_malloc0(TARGET_PAGE_SIZE);
+            acct_clear();
         }
 
         /* Make sure all dirty bits are set */
@@ -444,6 +509,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
            bytes_sent -1 represent no more blocks*/
         if (bytes_sent > 0) {
             bytes_transferred += bytes_sent;
+            acct_info.iterations++;
         } else if (bytes_sent == -1) { /* no more blocks */
             break;
         }
diff --git a/hmp.c b/hmp.c
index 0e4d63a..8829d3a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -162,6 +162,19 @@ void hmp_info_migrate(Monitor *mon)
                        info->disk->total >> 10);
     }
 
+    if (info->has_cache) {
+        monitor_printf(mon, "cache size: %" PRIu64 " bytes\n",
+                       info->cache->cache_size);
+        monitor_printf(mon, "xbzrle transferred: %" PRIu64 " kbytes\n",
+                       info->cache->xbzrle_bytes >> 10);
+        monitor_printf(mon, "xbzrle pages: %" PRIu64 " pages\n",
+                       info->cache->xbzrle_pages);
+        monitor_printf(mon, "xbzrle cache miss: %" PRIu64 "\n",
+                       info->cache->xbzrle_cache_miss);
+        monitor_printf(mon, "xbzrle overflow : %" PRIu64 "\n",
+                       info->cache->xbzrle_overflow);
+    }
+
     qapi_free_MigrationInfo(info);
 }
 
diff --git a/migration.c b/migration.c
index 4fb3b8a..ca953ec 100644
--- a/migration.c
+++ b/migration.c
@@ -148,6 +148,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
         info->ram->transferred = ram_bytes_transferred();
         info->ram->remaining = ram_bytes_remaining();
         info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->norm  = norm_mig_pages_transferred();
 
         if (blk_mig_active()) {
             info->has_disk = true;
@@ -156,6 +158,16 @@ MigrationInfo *qmp_query_migrate(Error **errp)
             info->disk->remaining = blk_mig_bytes_remaining();
             info->disk->total = blk_mig_bytes_total();
         }
+
+        if (migrate_use_xbzrle()) {
+            info->has_cache = true;
+            info->cache = g_malloc0(sizeof(*info->cache));
+            info->cache->cache_size = migrate_xbzrle_cache_size();
+            info->cache->xbzrle_bytes  = xbzrle_mig_bytes_transferred();
+            info->cache->xbzrle_pages  = xbzrle_mig_pages_transferred();
+            info->cache->xbzrle_cache_miss = xbzrle_mig_pages_cache_miss();
+            info->cache->xbzrle_overflow = xbzrle_mig_pages_overflow();
+        }
         break;
     case MIG_STATE_COMPLETED:
         info->has_status = true;
diff --git a/migration.h b/migration.h
index 6a5bc0e..a342253 100644
--- a/migration.h
+++ b/migration.h
@@ -82,6 +82,15 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 
+uint64_t dup_mig_bytes_transferred(void);
+uint64_t dup_mig_pages_transferred(void);
+uint64_t norm_mig_bytes_transferred(void);
+uint64_t norm_mig_pages_transferred(void);
+uint64_t xbzrle_mig_bytes_transferred(void);
+uint64_t xbzrle_mig_pages_transferred(void);
+uint64_t xbzrle_mig_pages_overflow(void);
+uint64_t xbzrle_mig_pages_cache_miss(void);
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 4d30552..6f22100 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -241,7 +241,28 @@
 # Since: 0.14.0.
 ##
 { 'type': 'MigrationStats',
-  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' } }
+  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int', 'duplicate': 'int', 'norm': 'int' } }
+
+##
+# @CacheStats
+#
+# Detailed XBZRLE migration cache statistics
+#
+# @cache_size: XBZRLE cache size
+#
+# @xbzrle_bytes: amount of bytes already transferred to the target VM
+#
+# @xbzrle_pages: amount of pages transferred to the target VM
+#
+# @xbzrle_cache_miss: numer of cache miss
+#
+# @xbzrle_overflow: number of overflows
+#
+# Since: 1.1
+##
+{ 'type': 'CacheStats',
+  'data': {'cache_size': 'int', 'xbzrle_bytes': 'int', 'xbzrle_pages': 'int',
+           'xbzrle_cache_miss': 'int', 'xbzrle_overflow': 'int' } }
 
 ##
 # @MigrationInfo
@@ -264,8 +285,8 @@
 ##
 { 'type': 'MigrationInfo',
   'data': {'*status': 'str', '*ram': 'MigrationStats',
-           '*disk': 'MigrationStats', '*params': ['MigrationCapabilityInfo']} }
-
+           '*disk': 'MigrationStats', '*params': ['MigrationCapabilityInfo'],
+           '*cache': 'CacheStats'} }
 ##
 # @query-migrate
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 3bdbaae..6ffc3dc 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2030,6 +2030,13 @@ The main json-object contains the following:
          - "transferred": amount transferred (json-int)
          - "remaining": amount remaining (json-int)
          - "total": total (json-int)
+- "cache": only present if "status" and XBZRLE is active.
+  it is a json-object with the following XBZRLE information:
+         - "cache size": XBZRLE cache size
+     	 - "xbzrle_bytes": total XBZRLE bytes transferred
+	 - "xbzrle_pages": number of XBZRLE compressed pages
+	 - "cache_miss": number of cache misses
+	 - "overflow": number of XBZRLE overflows
 
 Examples:
 
@@ -2081,6 +2088,27 @@ Examples:
       }
    }
 
+5. Migration is being performed and XBZRLE is active:
+
+-> { "execute": "query-migrate" }
+<- {
+      "return":{
+         "status":"active",
+         "ram":{
+            "total":1057024,
+            "remaining":1053304,
+            "transferred":3720
+         },
+         "cache":{
+	    "size": 1024
+            "xbzrle_transferred":20971520,
+	    "xbzrle_pages":2444343,
+	    "xbzrle_cache_miss:2244,
+	    "xbzrle_overflow":34434
+         }
+      }
+   }
+
 EQMP
 
     {
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app
@ 2012-05-16 11:46 Orit Wasserman
  0 siblings, 0 replies; 20+ messages in thread
From: Orit Wasserman @ 2012-05-16 11:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, quintela, Petter Svard, stefanha, mdroth, blauwirbel,
	Orit Wasserman, Benoit Hudzia, avi, pbonzini, Aidan Shribman

Changes from v9:
	- move cache implementation to separate files. Kept our own implementation because GCache or GHashTable have no size limit.
	- Add migrate_set_parameter function
	- removed XBZRLE option from migrate command
	- add cache size information to query_migrate command
	- add documantation file
	- write/read the exact XBZRLE header format
	- fix other review comments by Anthony and Juan

Changes from v8:
	Implement more effiecent cache_resize method
	fix set_cachesize command 

Changes from v7:
	Copy current page before encoding it, this will prevents page content
	change during the encoding.
	Allow changing the cache size during an active migration.
	Fix comments by Avi.

Changes from v6:
 1) add assert checks to ULEB encoding/decoding
 2) no need to send last zero run

Changes from v5:
1) Add migration capabilities
2) Use ULEB to encode run length
3) Do not send unmodified (dirty) page
3) Fix other patch comments

Using GCache or GHashTable requires allocating new buffer on every content change and have no size limit ,
so I decided to keep the simple cache implementation.

Changes from v4:
1) Rebase
2) divide patch into 9 patches
3) move memory allocation into cache_insert

Future work :
     Use SSE for encoding.
     Page ranking acording to their dirty rate and automatic activation/deactivation of the feature - will be sent in a separate patch series.	

By using XBZRLE (Xor Based Zero Run Length Encoding) we can reduce VM downtime
and total live-migration time of VMs running memory write intensive workloads
typical of large enterprise applications such as SAP ERP Systems, and generally
speaking for any application with a sparse memory update pattern.

The compression format uses the fact that we will have many zero (zero represents
an unchanged value). 
We repesent the page data delta by zero and non zero runs.
We represent a zero run with it's length (in bytes). 
We represent a non zero run with it's length (in bytes) and the data.
The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)

page = zrun nzrun
       | zrun nzrun page

zrun = length

nzrun = length byte...

length = uleb128 encoded integer

On the sender side XBZRLE is used as a compact delta encoding of page updates,
retrieving the old page content from an LRU cache (default size of 512 MB). The
receiving side uses the existing page content and XBZRLE to decode the new page
content.

This is a more compact way to store the delta than the previous version.

This work was originally based on research results published VEE 2011: Evaluation of
Delta Compression Techniques for Efficient Live Migration of Large Virtual
Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta encoder
XBRLE was improved further using XBZRLE instead.

XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
ideal for in-line, real-time encoding such as is needed for live-migration.

A typical usage scenario:
    {qemu} migrate_set_cachesize 256m
    {qemu} migrate_set_parameter xbzrle
    {qemu} migrate -d tcp:destination.host:4444
    {qemu} info migrate
    ...
    transferred ram: A kbytes
    remaining ram: B kbytes
    total ram: C kbytes
    cache size: D bytes
    xbzrle transferred: E kbytes
    xbzrle pages: F pages
    xbzrle cache miss: G
    xbzrle overflow : H

Testing: live migration with XBZRLE completed in 110 seconds, without live
migration was not able to complete.

A simple synthetic memory r/w load generator:
..    include <stdlib.h>
..    include <stdio.h>
..    int main()
..    {
..        char *buf = (char *) calloc(4096, 4096);
..        while (1) {
..            int i;
..            for (i = 0; i < 4096 * 4; i++) {
..                buf[i * 4096 / 4]++;
..            }
..            printf(".");
..        }
..    }

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>

Orit Wasserman (9):
  From: Isaku Yamahata <yamahata@valinux.co.jp>
  Add migration capabilites
  Add XBZRLE documentation
  Add cache handling functions
  Add uleb encoding/decoding functions
  Add save_block_hdr function
  Add XBZRLE to ram_save_block and ram_save_live
  Add set_cachesize command
  Add XBZRLE statistics

 Makefile.objs        |    1 +
 arch_init.c          |  317 +++++++++++++++++++++++++++++++++++++++++++++----
 block-migration.c    |    8 +-
 cache.c              |  212 +++++++++++++++++++++++++++++++++
 cutils.c             |   29 +++++
 docs/xbzrle.txt      |   97 +++++++++++++++
 hmp-commands.hx      |   31 +++++
 hmp.c                |   67 +++++++++++
 hmp.h                |    3 +
 include/qemu/cache.h |   81 +++++++++++++
 migration.c          |  134 ++++++++++++++++++++--
 migration.h          |   29 +++++-
 monitor.c            |    7 +
 qapi-schema.json     |   84 +++++++++++++-
 qemu-common.h        |   14 +++
 qmp-commands.hx      |   97 +++++++++++++++
 savevm.c             |  102 ++++++++++++++++-
 sysemu.h             |    3 +-
 vmstate.h            |    2 +-
 19 files changed, 1270 insertions(+), 48 deletions(-)
 create mode 100644 cache.c
 create mode 100644 docs/xbzrle.txt
 create mode 100644 include/qemu/cache.h

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-05-17 12:26 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-16 11:59 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman
2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 1/9] From: Isaku Yamahata <yamahata@valinux.co.jp> Orit Wasserman
2012-05-16 12:43   ` Peter Maydell
2012-05-16 16:50     ` [Qemu-devel] [PATCH v10 1/9] Add MigrationParams structure Orit Wasserman
2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 2/9] Add migration capabilites Orit Wasserman
2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 3/9] Add XBZRLE documentation Orit Wasserman
2012-05-16 16:28   ` Eric Blake
2012-05-16 16:53     ` Orit Wasserman
2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 4/9] Add cache handling functions Orit Wasserman
2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 5/9] Add uleb encoding/decoding functions Orit Wasserman
2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 6/9] Add save_block_hdr function Orit Wasserman
2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 7/9] Add XBZRLE to ram_save_block and ram_save_live Orit Wasserman
2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command Orit Wasserman
2012-05-16 16:45   ` Eric Blake
2012-05-16 17:04     ` Orit Wasserman
2012-05-16 17:58       ` Eric Blake
2012-05-17  9:46         ` Avi Kivity
2012-05-17 12:25           ` Orit Wasserman
2012-05-16 11:59 ` [Qemu-devel] [PATCH v10 9/9] Add XBZRLE statistics Orit Wasserman
  -- strict thread matches above, loose matches on Subject: below --
2012-05-16 11:46 [Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app Orit Wasserman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).