* [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal @ 2015-04-17 12:12 Bohdan Trach 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing Bohdan Trach ` (4 more replies) 0 siblings, 5 replies; 23+ messages in thread From: Bohdan Trach @ 2015-04-17 12:12 UTC (permalink / raw) To: qemu-devel; +Cc: Bohdan Trach, amit.shah, thomas.knauth, quintela From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> This patchset contains a checkpoint-assisted migration feature as proposed earlier on this list [1]. It allows reusing existing memory snapshots of guests to speed up migration of VMs between physical hosts. [1] https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01555.html Bohdan Trach (3): memory: Add dump-pc-mem command for checkpointing memory: implement checkpoint handling migration: use checkpoint during migration arch_init.c | 335 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- configure | 2 + hmp-commands.hx | 16 +++ hmp.c | 9 ++ hmp.h | 1 + qapi-schema.json | 11 ++ qemu-options.hx | 9 ++ qmp-commands.hx | 19 ++++ vl.c | 12 ++ 9 files changed, 405 insertions(+), 9 deletions(-) -- 2.0.5 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing 2015-04-17 12:12 [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal Bohdan Trach @ 2015-04-17 12:13 ` Bohdan Trach 2015-04-17 13:53 ` Eric Blake 2015-11-16 16:46 ` Dr. David Alan Gilbert 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 2/3] memory: implement checkpoint handling Bohdan Trach ` (3 subsequent siblings) 4 siblings, 2 replies; 23+ messages in thread From: Bohdan Trach @ 2015-04-17 12:13 UTC (permalink / raw) To: qemu-devel; +Cc: Bohdan Trach, amit.shah, thomas.knauth, quintela From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> dump-pc-mem command is added for checkpointing guest memory to file. Only system RAM region is saved. This checkpoint is later used to recover unchanged pages. Signed-off-by: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> --- arch_init.c | 19 +++++++++++++++++++ hmp-commands.hx | 16 ++++++++++++++++ hmp.c | 9 +++++++++ hmp.h | 1 + qapi-schema.json | 11 +++++++++++ qmp-commands.hx | 19 +++++++++++++++++++ 6 files changed, 75 insertions(+) diff --git a/arch_init.c b/arch_init.c index 4c8fcee..b8a4fb1 100644 --- a/arch_init.c +++ b/arch_init.c @@ -603,6 +603,25 @@ static void migration_bitmap_sync(void) } } +void qmp_dump_pc_ram(const char *file, Error **errp) { + + int rc; + int fd; + fd = open(file, + O_CREAT|O_WRONLY, + S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); + assert(-1 != fd); + + RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks); + uint64_t offset; + for (offset=0; offset<ram_size; offset+=TARGET_PAGE_SIZE) { + rc = write(fd, block->host+offset, TARGET_PAGE_SIZE); + assert(TARGET_PAGE_SIZE == rc); + } + rc = close(fd); + assert(0 == rc); +} + /** * ram_save_page: Send the given page to the stream * diff --git a/hmp-commands.hx b/hmp-commands.hx index 3089533..0c47a4f 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1043,6 +1043,22 @@ gdb. Without -z|-l|-s, the dump format is ELF. ETEXI { + .name = "dump-pc-ram", + .args_type = "filename:F", + .params = "filename", + .help = "dump pc ram to file", + .mhandler.cmd = hmp_dump_pc_ram, + }, + + +STEXI +@item dump-pc-ram +@findex dump-guest-memory +Dump pc ram to file. + filename: dump file name +ETEXI + + { .name = "snapshot_blkdev", .args_type = "reuse:-n,device:B,snapshot-file:s?,format:s?", .params = "[-n] device [new-image-file] [format]", diff --git a/hmp.c b/hmp.c index f31ae27..5e27dd8 100644 --- a/hmp.c +++ b/hmp.c @@ -1473,6 +1473,15 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict) g_free(prot); } +void hmp_dump_pc_ram(Monitor *mon, const QDict *qdict) +{ + Error *errp = NULL; + const char *file = qdict_get_str(qdict, "filename"); + + qmp_dump_pc_ram(file, &errp); + hmp_handle_error(mon, &errp); +} + void hmp_netdev_add(Monitor *mon, const QDict *qdict) { Error *err = NULL; diff --git a/hmp.h b/hmp.h index 2b9308b..805a71b 100644 --- a/hmp.h +++ b/hmp.h @@ -79,6 +79,7 @@ void hmp_block_job_complete(Monitor *mon, const QDict *qdict); void hmp_migrate(Monitor *mon, const QDict *qdict); void hmp_device_del(Monitor *mon, const QDict *qdict); void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict); +void hmp_dump_pc_ram(Monitor *mon, const QDict *qdict); void hmp_netdev_add(Monitor *mon, const QDict *qdict); void hmp_netdev_del(Monitor *mon, const QDict *qdict); void hmp_getfd(Monitor *mon, const QDict *qdict); diff --git a/qapi-schema.json b/qapi-schema.json index ac9594d..338bfd3 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3648,3 +3648,14 @@ # Since: 2.1 ## { 'command': 'rtc-reset-reinjection' } + +## +# @dump-pc-ram: +# +# Checkpoints guest. +# +# @file: the file to save the memory to as binary data +# +# Returns: Nothing on success +## +{ 'command': 'dump-pc-ram', 'data': {'file': 'str'} } diff --git a/qmp-commands.hx b/qmp-commands.hx index 3a42ad0..0be3c42 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -854,6 +854,25 @@ Notes: EQMP { + .name = "dump-pc-ram", + .args_type = "file:s", + .params = "file", + .help = "dump pc ram to file", + .user_print = monitor_user_noop, + .mhandler.cmd_new = qmp_marshal_input_dump_pc_ram, + }, + +SQMP +dump + + +Dump pc ram to file. + +Arguments: + +EQMP + + { .name = "query-dump-guest-memory-capability", .args_type = "", .mhandler.cmd_new = qmp_marshal_input_query_dump_guest_memory_capability, -- 2.0.5 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing Bohdan Trach @ 2015-04-17 13:53 ` Eric Blake 2015-04-18 7:40 ` Bohdan Trach 2015-11-16 16:46 ` Dr. David Alan Gilbert 1 sibling, 1 reply; 23+ messages in thread From: Eric Blake @ 2015-04-17 13:53 UTC (permalink / raw) To: Bohdan Trach, qemu-devel; +Cc: Bohdan Trach, amit.shah, thomas.knauth, quintela [-- Attachment #1: Type: text/plain, Size: 1210 bytes --] On 04/17/2015 06:13 AM, Bohdan Trach wrote: > From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > > dump-pc-mem command is added for checkpointing guest memory to > file. Only system RAM region is saved. This checkpoint is later used to > recover unchanged pages. > > Signed-off-by: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > --- > > +void qmp_dump_pc_ram(const char *file, Error **errp) { > + > + int rc; > + int fd; > + fd = open(file, Please use qemu_open() rather than raw open(), so that your command automatically supports /dev/fdset/nnn notation for reusing a file descriptor passed in via SCM_RIGHTS. > +++ b/qapi-schema.json > @@ -3648,3 +3648,14 @@ > # Since: 2.1 > ## > { 'command': 'rtc-reset-reinjection' } > + > +## > +# @dump-pc-ram: > +# > +# Checkpoints guest. The whole guest, or just guest memory? > +# > +# @file: the file to save the memory to as binary data > +# > +# Returns: Nothing on success Missing a 'Since: 2.4' designation. > +## > +{ 'command': 'dump-pc-ram', 'data': {'file': 'str'} } -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing 2015-04-17 13:53 ` Eric Blake @ 2015-04-18 7:40 ` Bohdan Trach 0 siblings, 0 replies; 23+ messages in thread From: Bohdan Trach @ 2015-04-18 7:40 UTC (permalink / raw) To: Eric Blake, qemu-devel; +Cc: amit.shah, thomas.knauth, quintela Thank You for the review. Please see comments inline. On 04/17/2015 03:53 PM, Eric Blake wrote: > On 04/17/2015 06:13 AM, Bohdan Trach wrote: >> From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> >> >> dump-pc-mem command is added for checkpointing guest memory to >> file. Only system RAM region is saved. This checkpoint is later used to >> recover unchanged pages. >> >> Signed-off-by: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> >> --- > >> >> +void qmp_dump_pc_ram(const char *file, Error **errp) { >> + >> + int rc; >> + int fd; >> + fd = open(file, > > Please use qemu_open() rather than raw open(), so that your command > automatically supports /dev/fdset/nnn notation for reusing a file > descriptor passed in via SCM_RIGHTS. > OK, this will be fixed. >> +++ b/qapi-schema.json >> @@ -3648,3 +3648,14 @@ >> # Since: 2.1 >> ## >> { 'command': 'rtc-reset-reinjection' } >> + >> +## >> +# @dump-pc-ram: >> +# >> +# Checkpoints guest. > > The whole guest, or just guest memory? > dump-pc-ram command currently writes "pc.ram" MemoryRegion to the specified file. >> +# >> +# @file: the file to save the memory to as binary data >> +# >> +# Returns: Nothing on success > > Missing a 'Since: 2.4' designation. > OK, I'll add it. -- With best regards, Bohdan Trach ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing Bohdan Trach 2015-04-17 13:53 ` Eric Blake @ 2015-11-16 16:46 ` Dr. David Alan Gilbert 2015-11-17 15:38 ` Bohdan Trach 1 sibling, 1 reply; 23+ messages in thread From: Dr. David Alan Gilbert @ 2015-11-16 16:46 UTC (permalink / raw) To: Bohdan Trach; +Cc: Bohdan Trach, amit.shah, thomas.knauth, qemu-devel, quintela * Bohdan Trach (bv.trach@gmail.com) wrote: > From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > > dump-pc-mem command is added for checkpointing guest memory to > file. Only system RAM region is saved. This checkpoint is later used to > recover unchanged pages. Why not just use the 'dump_guest_memory' commands; they dump it in interesting existing formats; they have headers in the files as well rather than just a raw blob of data. If you wanted to restrict to only certain RAM blocks, then I'd suggest adding a feature to that existing command. You might also find that you want other RAMBlocks as well, for example where RAM is added using hot plu, those are separate RAM blocks. Dave > Signed-off-by: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > --- > arch_init.c | 19 +++++++++++++++++++ > hmp-commands.hx | 16 ++++++++++++++++ > hmp.c | 9 +++++++++ > hmp.h | 1 + > qapi-schema.json | 11 +++++++++++ > qmp-commands.hx | 19 +++++++++++++++++++ > 6 files changed, 75 insertions(+) > > diff --git a/arch_init.c b/arch_init.c > index 4c8fcee..b8a4fb1 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -603,6 +603,25 @@ static void migration_bitmap_sync(void) > } > } > > +void qmp_dump_pc_ram(const char *file, Error **errp) { > + > + int rc; > + int fd; > + fd = open(file, > + O_CREAT|O_WRONLY, > + S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); > + assert(-1 != fd); > + > + RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks); > + uint64_t offset; > + for (offset=0; offset<ram_size; offset+=TARGET_PAGE_SIZE) { > + rc = write(fd, block->host+offset, TARGET_PAGE_SIZE); > + assert(TARGET_PAGE_SIZE == rc); > + } > + rc = close(fd); > + assert(0 == rc); > +} > + > /** > * ram_save_page: Send the given page to the stream > * > diff --git a/hmp-commands.hx b/hmp-commands.hx > index 3089533..0c47a4f 100644 > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -1043,6 +1043,22 @@ gdb. Without -z|-l|-s, the dump format is ELF. > ETEXI > > { > + .name = "dump-pc-ram", > + .args_type = "filename:F", > + .params = "filename", > + .help = "dump pc ram to file", > + .mhandler.cmd = hmp_dump_pc_ram, > + }, > + > + > +STEXI > +@item dump-pc-ram > +@findex dump-guest-memory > +Dump pc ram to file. > + filename: dump file name > +ETEXI > + > + { > .name = "snapshot_blkdev", > .args_type = "reuse:-n,device:B,snapshot-file:s?,format:s?", > .params = "[-n] device [new-image-file] [format]", > diff --git a/hmp.c b/hmp.c > index f31ae27..5e27dd8 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -1473,6 +1473,15 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict) > g_free(prot); > } > > +void hmp_dump_pc_ram(Monitor *mon, const QDict *qdict) > +{ > + Error *errp = NULL; > + const char *file = qdict_get_str(qdict, "filename"); > + > + qmp_dump_pc_ram(file, &errp); > + hmp_handle_error(mon, &errp); > +} > + > void hmp_netdev_add(Monitor *mon, const QDict *qdict) > { > Error *err = NULL; > diff --git a/hmp.h b/hmp.h > index 2b9308b..805a71b 100644 > --- a/hmp.h > +++ b/hmp.h > @@ -79,6 +79,7 @@ void hmp_block_job_complete(Monitor *mon, const QDict *qdict); > void hmp_migrate(Monitor *mon, const QDict *qdict); > void hmp_device_del(Monitor *mon, const QDict *qdict); > void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict); > +void hmp_dump_pc_ram(Monitor *mon, const QDict *qdict); > void hmp_netdev_add(Monitor *mon, const QDict *qdict); > void hmp_netdev_del(Monitor *mon, const QDict *qdict); > void hmp_getfd(Monitor *mon, const QDict *qdict); > diff --git a/qapi-schema.json b/qapi-schema.json > index ac9594d..338bfd3 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -3648,3 +3648,14 @@ > # Since: 2.1 > ## > { 'command': 'rtc-reset-reinjection' } > + > +## > +# @dump-pc-ram: > +# > +# Checkpoints guest. > +# > +# @file: the file to save the memory to as binary data > +# > +# Returns: Nothing on success > +## > +{ 'command': 'dump-pc-ram', 'data': {'file': 'str'} } > diff --git a/qmp-commands.hx b/qmp-commands.hx > index 3a42ad0..0be3c42 100644 > --- a/qmp-commands.hx > +++ b/qmp-commands.hx > @@ -854,6 +854,25 @@ Notes: > EQMP > > { > + .name = "dump-pc-ram", > + .args_type = "file:s", > + .params = "file", > + .help = "dump pc ram to file", > + .user_print = monitor_user_noop, > + .mhandler.cmd_new = qmp_marshal_input_dump_pc_ram, > + }, > + > +SQMP > +dump > + > + > +Dump pc ram to file. > + > +Arguments: > + > +EQMP > + > + { > .name = "query-dump-guest-memory-capability", > .args_type = "", > .mhandler.cmd_new = qmp_marshal_input_query_dump_guest_memory_capability, > -- > 2.0.5 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing 2015-11-16 16:46 ` Dr. David Alan Gilbert @ 2015-11-17 15:38 ` Bohdan Trach 2015-11-17 16:02 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 23+ messages in thread From: Bohdan Trach @ 2015-11-17 15:38 UTC (permalink / raw) To: Dr. David Alan Gilbert; +Cc: amit.shah, thomas.knauth, qemu-devel, quintela Hi David, thank you for the feedback! On 11/16/2015 05:46 PM, Dr. David Alan Gilbert wrote: > * Bohdan Trach (bv.trach@gmail.com) wrote: >> From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> >> >> dump-pc-mem command is added for checkpointing guest memory to >> file. Only system RAM region is saved. This checkpoint is later used to >> recover unchanged pages. > > Why not just use the 'dump_guest_memory' commands; they dump it in interesting > existing formats; they have headers in the files as well rather than just > a raw blob of data. > If you wanted to restrict to only certain RAM blocks, then I'd suggest adding > a feature to that existing command. > You might also find that you want other RAMBlocks as well, for example where > RAM is added using hot plu, those are separate RAM blocks. We will try to rework these patches to use existing formats. Current format was used because it is extremely simple to work with. The restriction of saving only 'pc.ram' RAMBlock is just the consequence of this design choice. > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- With best regards, Bohdan Trach ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing 2015-11-17 15:38 ` Bohdan Trach @ 2015-11-17 16:02 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 23+ messages in thread From: Dr. David Alan Gilbert @ 2015-11-17 16:02 UTC (permalink / raw) To: Bohdan Trach; +Cc: amit.shah, thomas.knauth, qemu-devel, quintela * Bohdan Trach (bohdan.trach@mailbox.tu-dresden.de) wrote: > Hi David, thank you for the feedback! No problem, sorry for the delay, > On 11/16/2015 05:46 PM, Dr. David Alan Gilbert wrote: > > * Bohdan Trach (bv.trach@gmail.com) wrote: > >> From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > >> > >> dump-pc-mem command is added for checkpointing guest memory to > >> file. Only system RAM region is saved. This checkpoint is later used to > >> recover unchanged pages. > > > > Why not just use the 'dump_guest_memory' commands; they dump it in interesting > > existing formats; they have headers in the files as well rather than just > > a raw blob of data. > > If you wanted to restrict to only certain RAM blocks, then I'd suggest adding > > a feature to that existing command. > > You might also find that you want other RAMBlocks as well, for example where > > RAM is added using hot plu, those are separate RAM blocks. > > We will try to rework these patches to use existing formats. Current > format was used because it is extremely simple to work with. The > restriction of saving only 'pc.ram' RAMBlock is just the consequence > of this design choice. I think the ELF one should actually turn out to be the easiest; there's probably some existing ELF code you can find in the QEMU code base to make it easier to parse, but once you find the start of the block it should then be contiguous. Dave > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > -- > With best regards, > Bohdan Trach -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH RFC 2/3] memory: implement checkpoint handling 2015-04-17 12:12 [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal Bohdan Trach 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing Bohdan Trach @ 2015-04-17 12:13 ` Bohdan Trach 2015-11-16 16:56 ` Dr. David Alan Gilbert 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration Bohdan Trach ` (2 subsequent siblings) 4 siblings, 1 reply; 23+ messages in thread From: Bohdan Trach @ 2015-04-17 12:13 UTC (permalink / raw) To: qemu-devel; +Cc: Bohdan Trach, amit.shah, thomas.knauth, quintela From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> This commit adds functions, which are used to work with checkpoint files. A new command-line option `-checkpoint` is added, which is used to specify the checkpoint file. Currently, MD5 function from OpenSSL is used to checkpoint memory. Signed-off-by: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> --- arch_init.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ configure | 2 + qemu-options.hx | 9 ++++ vl.c | 12 +++++ 4 files changed, 172 insertions(+) diff --git a/arch_init.c b/arch_init.c index b8a4fb1..eda86d4 100644 --- a/arch_init.c +++ b/arch_init.c @@ -27,6 +27,7 @@ #ifndef _WIN32 #include <sys/types.h> #include <sys/mman.h> +#include <openssl/md5.h> #endif #include "config.h" #include "monitor/monitor.h" @@ -140,6 +141,8 @@ static struct defconfig_file { static const uint8_t ZERO_TARGET_PAGE[TARGET_PAGE_SIZE]; +int fd_checkpoint = -1; + int qemu_read_default_config_files(bool userconfig) { int ret; @@ -184,6 +187,30 @@ static void XBZRLE_cache_lock(void) qemu_mutex_lock(&XBZRLE.lock); } +#ifdef DEBUG_ARCH_INIT +static char* md5s(const uint8_t *digest) { + /* MD5 is 16 bytes, i.e., 32 hexadecimal digits. + 1 for trailing \0. */ + static const size_t size = 32 + 1; + static char hex_digits[32+1]; + + /* snprintf(hex_digits, */ + /* MD5_DIGEST_LENGTH+1, */ + /* "%016lx", */ + /* *((uint64_t*)digest)); */ + /* snprintf(hex_digits+MD5_DIGEST_LENGTH, */ + /* MD5_DIGEST_LENGTH+1, */ + /* "%016lx", */ + /* *((uint64_t*)(digest+sizeof(uint64_t)))); */ + int digit; + for (digit = 0; digit < 32; digit += 2) { + snprintf(hex_digits+digit, 3, "%02x", digest[digit/2]); + } + + hex_digits[size-1] = '\0'; + return hex_digits; +} +#endif + static void XBZRLE_cache_unlock(void) { if (migrate_use_xbzrle()) @@ -603,6 +630,126 @@ static void migration_bitmap_sync(void) } } +int uint128_compare(const void*, const void*); +int uint128_compare(const void* x, const void* y) +{ + return memcmp(x, y, MD5_DIGEST_LENGTH); +} + +/* indexed by page number */ +static uint64_t hashes_size = 0; +static uint64_t hashes_entries = 0; +static uint8_t *hashes = 0; + +static uint32_t get_page_nr(uint64_t addr) { + assert((addr % TARGET_PAGE_SIZE) == 0); + return (addr / TARGET_PAGE_SIZE); +} + +typedef struct { + uint8_t hash[MD5_DIGEST_LENGTH]; + uint64_t offset; +} hash_offset_entry; + +static uint64_t hash_offset_entries = 0; +static uint64_t max_hash_offset_entries; +static hash_offset_entry* hash_offset_array = 0; +static uint8_t all_zeroes_hash[MD5_DIGEST_LENGTH]; + +int cmp_hash_offset_entry(const void*, const void*); +int cmp_hash_offset_entry(const void* a, const void* b) { + hash_offset_entry* e = (hash_offset_entry*) a; + hash_offset_entry* f = (hash_offset_entry*) b; + + return memcmp(e->hash, f->hash, MD5_DIGEST_LENGTH); +} + +void veecycle_init(void); +void veecycle_init(void) { + RAMBlock *block; + block = QLIST_FIRST_RCU(&ram_list.blocks); + if (block == NULL) + return; + assert(0 == strncmp(block->mr->name, "pc.ram", strlen("pc.ram"))); + + max_hash_offset_entries = hashes_entries = (ram_size / TARGET_PAGE_SIZE); + DPRINTF("pages=%lu\n", hashes_entries); + hashes_size = hashes_entries * MD5_DIGEST_LENGTH; + + hashes = g_malloc(hashes_size); + assert(0 != hashes); + bzero(hashes, hashes_size); + + uint8_t all_zeroes[TARGET_PAGE_SIZE]; + bzero(all_zeroes, TARGET_PAGE_SIZE); + MD5(all_zeroes, TARGET_PAGE_SIZE, all_zeroes_hash); + + hash_offset_array = g_malloc(max_hash_offset_entries * sizeof(hash_offset_entry)); + assert(0 != hash_offset_array); + bzero(hash_offset_array, max_hash_offset_entries * sizeof(hash_offset_entry)); +} + +void init_checksum_lookup_table(const char *checkpoint_path); +void init_checksum_lookup_table(const char *checkpoint_path) { + int rc; + uint8_t* ram; + RAMBlock *block; + + DPRINTF("ram_size=%lu\n", ram_size); + + struct stat sb; + rc = stat(checkpoint_path, &sb); + if (rc == -1 && errno == ENOENT) return; + assert(0 == rc); + + block = QLIST_FIRST_RCU(&ram_list.blocks); + assert(0 == strncmp(block->mr->name, "pc.ram", strlen("pc.ram"))); + ram = block->host; + assert(block->used_length == ram_size); + + /* Ignore checkpoint file if size is different from VM's current memory size. */ + assert(sb.st_size == ram_size); + + fd_checkpoint = open(checkpoint_path, O_RDWR); + assert(fd_checkpoint != -1); + + uint64_t idx; + for (idx=0; idx<ram_size; idx+=TARGET_PAGE_SIZE) { + rc = read(fd_checkpoint, ram+idx, TARGET_PAGE_SIZE); + assert(rc == TARGET_PAGE_SIZE); + assert(hash_offset_entries < max_hash_offset_entries); + MD5((unsigned char*)(ram+idx), + TARGET_PAGE_SIZE, + (unsigned char*)hash_offset_array[hash_offset_entries].hash); + + hash_offset_array[hash_offset_entries].offset = idx; + + DPRINTF("hash=%s offset=%08lx\n", + md5s(hash_offset_array[hash_offset_entries].hash), + hash_offset_array[hash_offset_entries].offset); + + hash_offset_entries++; + }; + + qsort(hash_offset_array, hash_offset_entries, sizeof(hash_offset_entry), + cmp_hash_offset_entry); +} + +int is_ram_addr(void* host); +int is_ram_addr(void* host) { + static RAMBlock *block = NULL; + + block = QLIST_FIRST_RCU(&ram_list.blocks); + assert(0 == strncmp(block->mr->name, "pc.ram", strlen("pc.ram"))); + + return (host >= memory_region_get_ram_ptr(block->mr) && + host < memory_region_get_ram_ptr(block->mr) + block->used_length); +} + +static int is_outgoing_with_checkpoint(void) { + return (fd_checkpoint != -1); +} + void qmp_dump_pc_ram(const char *file, Error **errp) { int rc; @@ -869,6 +1016,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque) bitmap_sync_count = 0; migration_bitmap_sync_init(); + qsort(hashes, hashes_entries, MD5_DIGEST_LENGTH, uint128_compare); + if (migrate_use_xbzrle()) { XBZRLE_cache_lock(); XBZRLE.cache = cache_init(migrate_xbzrle_cache_size() / diff --git a/configure b/configure index 6969f6f..fd6dd23 100755 --- a/configure +++ b/configure @@ -337,6 +337,8 @@ vhdx="" quorum="" numa="" +LIBS="-lcrypto" + # parse CC options first for opt do optarg=`expr "x$opt" : 'x[^=]*=\(.*\)'` diff --git a/qemu-options.hx b/qemu-options.hx index 319d971..ece4758 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -268,6 +268,15 @@ If @var{slots} and @var{maxmem} are not specified, memory hotplug won't be enabled and the guest startup RAM will never increase. ETEXI +DEF("checkpoint", HAS_ARG, QEMU_OPTION_checkpoint, + "-checkpoint file path to checkpoint file\n", QEMU_ARCH_ALL) +STEXI +@item -checkpoint @var{path} +@findex -checkpoint +Checkpoint file to use during incoming migrations. +Reduces network traffic and total migration time. +ETEXI + DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath, "-mem-path FILE provide backing storage for guest RAM\n", QEMU_ARCH_ALL) STEXI diff --git a/vl.c b/vl.c index 74c2681..d423e99 100644 --- a/vl.c +++ b/vl.c @@ -134,6 +134,7 @@ int display_opengl; static int display_remote; const char* keyboard_layout = NULL; ram_addr_t ram_size; +const char *checkpoint_path = NULL; const char *mem_path = NULL; int mem_prealloc = 0; /* force preallocation of physical target memory */ bool enable_mlock = false; @@ -2643,6 +2644,9 @@ out: return 0; } +void init_checksum_lookup_table(const char *checkpoint_path); +void veecycle_init(void); + static void set_memory_options(uint64_t *ram_slots, ram_addr_t *maxram_size) { uint64_t sz; @@ -3116,6 +3120,9 @@ int main(int argc, char **argv, char **envp) } break; #endif + case QEMU_OPTION_checkpoint: + checkpoint_path = optarg; + break; case QEMU_OPTION_mempath: mem_path = optarg; break; @@ -4331,6 +4338,7 @@ int main(int argc, char **argv, char **envp) } } + veecycle_init(); qdev_prop_check_globals(); if (vmstate_dump_file) { /* dump and exit */ @@ -4339,6 +4347,10 @@ int main(int argc, char **argv, char **envp) } if (incoming) { + if (checkpoint_path) { + init_checksum_lookup_table(checkpoint_path); + } + Error *local_err = NULL; qemu_start_incoming_migration(incoming, &local_err); if (local_err) { -- 2.0.5 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 2/3] memory: implement checkpoint handling 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 2/3] memory: implement checkpoint handling Bohdan Trach @ 2015-11-16 16:56 ` Dr. David Alan Gilbert 2015-11-17 15:38 ` Bohdan Trach 0 siblings, 1 reply; 23+ messages in thread From: Dr. David Alan Gilbert @ 2015-11-16 16:56 UTC (permalink / raw) To: Bohdan Trach; +Cc: Bohdan Trach, amit.shah, thomas.knauth, qemu-devel, quintela * Bohdan Trach (bv.trach@gmail.com) wrote: > From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > > This commit adds functions, which are used to work with checkpoint > files. A new command-line option `-checkpoint` is added, which is used > to specify the checkpoint file. Currently, MD5 function from OpenSSL is > used to checkpoint memory. > > Signed-off-by: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > --- > arch_init.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > configure | 2 + > qemu-options.hx | 9 ++++ > vl.c | 12 +++++ > 4 files changed, 172 insertions(+) > > diff --git a/arch_init.c b/arch_init.c > index b8a4fb1..eda86d4 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -27,6 +27,7 @@ > #ifndef _WIN32 > #include <sys/types.h> > #include <sys/mman.h> > +#include <openssl/md5.h> > #endif > #include "config.h" > #include "monitor/monitor.h" > @@ -140,6 +141,8 @@ static struct defconfig_file { > > static const uint8_t ZERO_TARGET_PAGE[TARGET_PAGE_SIZE]; > > +int fd_checkpoint = -1; > + > int qemu_read_default_config_files(bool userconfig) > { > int ret; > @@ -184,6 +187,30 @@ static void XBZRLE_cache_lock(void) > qemu_mutex_lock(&XBZRLE.lock); > } > > +#ifdef DEBUG_ARCH_INIT > +static char* md5s(const uint8_t *digest) { > + /* MD5 is 16 bytes, i.e., 32 hexadecimal digits. + 1 for trailing \0. */ > + static const size_t size = 32 + 1; > + static char hex_digits[32+1]; > + > + /* snprintf(hex_digits, */ > + /* MD5_DIGEST_LENGTH+1, */ > + /* "%016lx", */ > + /* *((uint64_t*)digest)); */ > + /* snprintf(hex_digits+MD5_DIGEST_LENGTH, */ > + /* MD5_DIGEST_LENGTH+1, */ > + /* "%016lx", */ > + /* *((uint64_t*)(digest+sizeof(uint64_t)))); */ > + int digit; > + for (digit = 0; digit < 32; digit += 2) { > + snprintf(hex_digits+digit, 3, "%02x", digest[digit/2]); > + } > + > + hex_digits[size-1] = '\0'; > + return hex_digits; > +} > +#endif > + > static void XBZRLE_cache_unlock(void) > { > if (migrate_use_xbzrle()) > @@ -603,6 +630,126 @@ static void migration_bitmap_sync(void) > } > } > > +int uint128_compare(const void*, const void*); > +int uint128_compare(const void* x, const void* y) > +{ > + return memcmp(x, y, MD5_DIGEST_LENGTH); > +} Is anything in qemu/int128.h useful here? However, as mentioned in my previous follow up, I think you need something stronger than MD5 to stop collisions; sha256 seems appropriate and CPUs have acceleration instructions for it these days. > +/* indexed by page number */ > +static uint64_t hashes_size = 0; > +static uint64_t hashes_entries = 0; > +static uint8_t *hashes = 0; > + > +static uint32_t get_page_nr(uint64_t addr) { > + assert((addr % TARGET_PAGE_SIZE) == 0); > + return (addr / TARGET_PAGE_SIZE); > +} > + > +typedef struct { > + uint8_t hash[MD5_DIGEST_LENGTH]; > + uint64_t offset; > +} hash_offset_entry; > + > +static uint64_t hash_offset_entries = 0; > +static uint64_t max_hash_offset_entries; > +static hash_offset_entry* hash_offset_array = 0; > +static uint8_t all_zeroes_hash[MD5_DIGEST_LENGTH]; > + > +int cmp_hash_offset_entry(const void*, const void*); > +int cmp_hash_offset_entry(const void* a, const void* b) { You seem to do this trick of declaring and then defining a lot; if you need it only within a file then make it static and then you don't need the declaration unless you use it before it's definition. If you want to use it outside of this file then the declaration should be in a header. I guess this stuff should be in migration/ram.c these days? > + hash_offset_entry* e = (hash_offset_entry*) a; > + hash_offset_entry* f = (hash_offset_entry*) b; > + > + return memcmp(e->hash, f->hash, MD5_DIGEST_LENGTH); > +} > + > +void veecycle_init(void); > +void veecycle_init(void) { Nice name; but if you're using a cute name make sure that you put a big comment to let people know what they're looking at! > + RAMBlock *block; > + block = QLIST_FIRST_RCU(&ram_list.blocks); > + if (block == NULL) > + return; > + assert(0 == strncmp(block->mr->name, "pc.ram", strlen("pc.ram"))); This also makes it PC specific; what about everything else? > + max_hash_offset_entries = hashes_entries = (ram_size / TARGET_PAGE_SIZE); > + DPRINTF("pages=%lu\n", hashes_entries); > + hashes_size = hashes_entries * MD5_DIGEST_LENGTH; > + > + hashes = g_malloc(hashes_size); > + assert(0 != hashes); > + bzero(hashes, hashes_size); Then you can use g_new0 to allocate and zero fill, but becareful, since these things are probably quite big you might want to use one of the g_try_ allocators. > + uint8_t all_zeroes[TARGET_PAGE_SIZE]; > + bzero(all_zeroes, TARGET_PAGE_SIZE); > + MD5(all_zeroes, TARGET_PAGE_SIZE, all_zeroes_hash); > + > + hash_offset_array = g_malloc(max_hash_offset_entries * sizeof(hash_offset_entry)); > + assert(0 != hash_offset_array); > + bzero(hash_offset_array, max_hash_offset_entries * sizeof(hash_offset_entry)); > +} > + > +void init_checksum_lookup_table(const char *checkpoint_path); > +void init_checksum_lookup_table(const char *checkpoint_path) { > + int rc; > + uint8_t* ram; > + RAMBlock *block; > + > + DPRINTF("ram_size=%lu\n", ram_size); > + > + struct stat sb; > + rc = stat(checkpoint_path, &sb); > + if (rc == -1 && errno == ENOENT) return; > + assert(0 == rc); > + > + block = QLIST_FIRST_RCU(&ram_list.blocks); > + assert(0 == strncmp(block->mr->name, "pc.ram", strlen("pc.ram"))); > + ram = block->host; > + assert(block->used_length == ram_size); > + > + /* Ignore checkpoint file if size is different from VM's current memory size. */ > + assert(sb.st_size == ram_size); Why does this matter? Can't you reuse the hash of pages that are at different locations in the stored file? e.g. a hash from an old/future boot of the same VM or one where the page got moved but unchanged? > + fd_checkpoint = open(checkpoint_path, O_RDWR); > + assert(fd_checkpoint != -1); > + > + uint64_t idx; > + for (idx=0; idx<ram_size; idx+=TARGET_PAGE_SIZE) { > + rc = read(fd_checkpoint, ram+idx, TARGET_PAGE_SIZE); > + assert(rc == TARGET_PAGE_SIZE); > + assert(hash_offset_entries < max_hash_offset_entries); > + MD5((unsigned char*)(ram+idx), > + TARGET_PAGE_SIZE, > + (unsigned char*)hash_offset_array[hash_offset_entries].hash); > + > + hash_offset_array[hash_offset_entries].offset = idx; > + > + DPRINTF("hash=%s offset=%08lx\n", > + md5s(hash_offset_array[hash_offset_entries].hash), > + hash_offset_array[hash_offset_entries].offset); > + > + hash_offset_entries++; > + }; > + > + qsort(hash_offset_array, hash_offset_entries, sizeof(hash_offset_entry), > + cmp_hash_offset_entry); > +} > + > +int is_ram_addr(void* host); > +int is_ram_addr(void* host) { > + static RAMBlock *block = NULL; > + > + block = QLIST_FIRST_RCU(&ram_list.blocks); > + assert(0 == strncmp(block->mr->name, "pc.ram", strlen("pc.ram"))); > + > + return (host >= memory_region_get_ram_ptr(block->mr) && > + host < memory_region_get_ram_ptr(block->mr) + block->used_length); > +} > + > +static int is_outgoing_with_checkpoint(void) { > + return (fd_checkpoint != -1); > +} > + > void qmp_dump_pc_ram(const char *file, Error **errp) { > > int rc; > @@ -869,6 +1016,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > bitmap_sync_count = 0; > migration_bitmap_sync_init(); > > + qsort(hashes, hashes_entries, MD5_DIGEST_LENGTH, uint128_compare); > + > if (migrate_use_xbzrle()) { > XBZRLE_cache_lock(); > XBZRLE.cache = cache_init(migrate_xbzrle_cache_size() / > diff --git a/configure b/configure > index 6969f6f..fd6dd23 100755 > --- a/configure > +++ b/configure > @@ -337,6 +337,8 @@ vhdx="" > quorum="" > numa="" > > +LIBS="-lcrypto" > + > # parse CC options first > for opt do > optarg=`expr "x$opt" : 'x[^=]*=\(.*\)'` > diff --git a/qemu-options.hx b/qemu-options.hx > index 319d971..ece4758 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -268,6 +268,15 @@ If @var{slots} and @var{maxmem} are not specified, memory hotplug won't > be enabled and the guest startup RAM will never increase. > ETEXI > > +DEF("checkpoint", HAS_ARG, QEMU_OPTION_checkpoint, > + "-checkpoint file path to checkpoint file\n", QEMU_ARCH_ALL) > +STEXI > +@item -checkpoint @var{path} > +@findex -checkpoint > +Checkpoint file to use during incoming migrations. > +Reduces network traffic and total migration time. > +ETEXI > + > DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath, > "-mem-path FILE provide backing storage for guest RAM\n", QEMU_ARCH_ALL) > STEXI > diff --git a/vl.c b/vl.c > index 74c2681..d423e99 100644 > --- a/vl.c > +++ b/vl.c > @@ -134,6 +134,7 @@ int display_opengl; > static int display_remote; > const char* keyboard_layout = NULL; > ram_addr_t ram_size; > +const char *checkpoint_path = NULL; > const char *mem_path = NULL; > int mem_prealloc = 0; /* force preallocation of physical target memory */ > bool enable_mlock = false; > @@ -2643,6 +2644,9 @@ out: > return 0; > } > > +void init_checksum_lookup_table(const char *checkpoint_path); > +void veecycle_init(void); > + > static void set_memory_options(uint64_t *ram_slots, ram_addr_t *maxram_size) > { > uint64_t sz; > @@ -3116,6 +3120,9 @@ int main(int argc, char **argv, char **envp) > } > break; > #endif > + case QEMU_OPTION_checkpoint: > + checkpoint_path = optarg; > + break; > case QEMU_OPTION_mempath: > mem_path = optarg; > break; > @@ -4331,6 +4338,7 @@ int main(int argc, char **argv, char **envp) > } > } > > + veecycle_init(); > qdev_prop_check_globals(); > if (vmstate_dump_file) { > /* dump and exit */ > @@ -4339,6 +4347,10 @@ int main(int argc, char **argv, char **envp) > } > > if (incoming) { > + if (checkpoint_path) { > + init_checksum_lookup_table(checkpoint_path); > + } > + > Error *local_err = NULL; > qemu_start_incoming_migration(incoming, &local_err); > if (local_err) { > -- > 2.0.5 > Dave > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 2/3] memory: implement checkpoint handling 2015-11-16 16:56 ` Dr. David Alan Gilbert @ 2015-11-17 15:38 ` Bohdan Trach 0 siblings, 0 replies; 23+ messages in thread From: Bohdan Trach @ 2015-11-17 15:38 UTC (permalink / raw) To: Dr. David Alan Gilbert; +Cc: amit.shah, thomas.knauth, qemu-devel, quintela On 11/16/2015 05:56 PM, Dr. David Alan Gilbert wrote: > Is anything in qemu/int128.h useful here? > However, as mentioned in my previous follow up, > I think you need something stronger than MD5 to stop collisions; > sha256 seems appropriate and CPUs have acceleration instructions > for it these days. Ok, we will switch to the SHA256 hash from GnuTLS. > You seem to do this trick of declaring and then defining a lot; > if you need it only within a file then make it static and then you > don't need the declaration unless you use it before it's definition. > If you want to use it outside of this file then the declaration should > be in a header. Will fix this. > I guess this stuff should be in migration/ram.c these days? Yes, this code was written for 2.3 branch, and definitely needs updating. > Nice name; but if you're using a cute name make sure that you put > a big comment to let people know what they're looking at! Will fix this. > This also makes it PC specific; what about everything else? I believe if we switch to existing memory dump formats this restriction will be removed. >> + /* Ignore checkpoint file if size is different from VM's current memory size. */ >> + assert(sb.st_size == ram_size); > > Why does this matter? Can't you reuse the hash of pages that are at different > locations in the stored file? e.g. a hash from an old/future boot of the same > VM or one where the page got moved but unchanged? Here we check if the size of the main memory block matches the checkpoint. As we didn't use any format, we wanted to detect at least some cases where a checkpoint from a different VM is passed to Qemu by accident. > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- With best regards, Bohdan Trach ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration 2015-04-17 12:12 [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal Bohdan Trach 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing Bohdan Trach 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 2/3] memory: implement checkpoint handling Bohdan Trach @ 2015-04-17 12:13 ` Bohdan Trach 2015-11-17 12:26 ` Dr. David Alan Gilbert 2015-04-24 11:38 ` [Qemu-devel] [PATCH RFC, Ping 0/3] Checkpoint-assisted migration proposal Bohdan Trach 2015-09-15 10:39 ` [Qemu-devel] [PATCH RFC " Amit Shah 4 siblings, 1 reply; 23+ messages in thread From: Bohdan Trach @ 2015-04-17 12:13 UTC (permalink / raw) To: qemu-devel; +Cc: Bohdan Trach, amit.shah, thomas.knauth, quintela From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> Extend memory page saving and loading functions to utilize information available in checkpoints to avoid sending full pages over the network. Signed-off-by: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> --- arch_init.c | 167 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 158 insertions(+), 9 deletions(-) diff --git a/arch_init.c b/arch_init.c index eda86d4..fca56f0 100644 --- a/arch_init.c +++ b/arch_init.c @@ -128,6 +128,8 @@ static uint64_t bitmap_sync_count; #define RAM_SAVE_FLAG_CONTINUE 0x20 #define RAM_SAVE_FLAG_XBZRLE 0x40 /* 0x80 is reserved in migration.h start with 0x100 next */ +#define RAM_SAVE_FLAG_HASH 0x100 +#define RAM_SAVE_FLAG_PAGE_HASH 0x200 static struct defconfig_file { const char *filename; @@ -790,6 +792,7 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset, uint8_t *p; int ret; bool send_async = true; + uint8_t hash[MD5_DIGEST_LENGTH]; p = memory_region_get_ram_ptr(mr) + offset; @@ -841,16 +844,45 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset, /* XBZRLE overflow or normal page */ if (pages == -1) { - *bytes_transferred += save_page_header(f, block, - offset | RAM_SAVE_FLAG_PAGE); - if (send_async) { - qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE); - } else { - qemu_put_buffer(f, p, TARGET_PAGE_SIZE); + if (is_outgoing_with_checkpoint() && + 0 == strncmp(block->mr->name, "pc.ram", strlen("pc.ram"))) { + MD5(p, TARGET_PAGE_SIZE, hash); + + if (NULL != bsearch(hash, hashes, hashes_entries, + MD5_DIGEST_LENGTH, uint128_compare)) { + + *bytes_transferred += save_page_header(f, block, offset | RAM_SAVE_FLAG_HASH); +#ifdef DEBUG_HASH + qemu_put_buffer(f, p, TARGET_PAGE_SIZE); + *bytes_transferred += TARGET_PAGE_SIZE; +#endif + qemu_put_buffer(f, hash, MD5_DIGEST_LENGTH); + *bytes_transferred += MD5_DIGEST_LENGTH; + pages = 1; + DPRINTF("ram_save_page: FLAG_HASH guest_phy_addr=%08lx flags=%lx hash=%s\n", offset&TARGET_PAGE_MASK, (offset | RAM_SAVE_FLAG_HASH)& ~TARGET_PAGE_MASK, md5s(hash)); + } else { + *bytes_transferred += save_page_header(f, block, offset | RAM_SAVE_FLAG_PAGE_HASH); + qemu_put_buffer(f, p, TARGET_PAGE_SIZE); + qemu_put_buffer(f, hash, MD5_DIGEST_LENGTH); + *bytes_transferred += TARGET_PAGE_SIZE; + *bytes_transferred += MD5_DIGEST_LENGTH; + pages = 1; + DPRINTF("ram_save_page: FLAG_PAGE_HASH guest_phy_addr=%08lx flags=%lx hash=%s\n", offset&TARGET_PAGE_MASK, (offset | RAM_SAVE_FLAG_PAGE_HASH)& ~TARGET_PAGE_MASK, md5s(hash)); + } + } + if (pages == -1) { + *bytes_transferred += save_page_header(f, block, + offset | RAM_SAVE_FLAG_PAGE); + if (send_async) { + qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE); + } else { + qemu_put_buffer(f, p, TARGET_PAGE_SIZE); + } + *bytes_transferred += TARGET_PAGE_SIZE; + pages = 1; + acct_info.norm_pages++; + DPRINTF("ram_save_page: FLAG_PAGE guest_phy_addr=%08lx flags=%lx", offset&TARGET_PAGE_MASK, (offset | RAM_SAVE_FLAG_PAGE)& ~TARGET_PAGE_MASK); } - *bytes_transferred += TARGET_PAGE_SIZE; - pages = 1; - acct_info.norm_pages++; } XBZRLE_cache_unlock(); @@ -963,6 +995,8 @@ void free_xbzrle_decoded_buf(void) xbzrle_decoded_buf = NULL; } +extern const char *checkpoint_path; + static void migration_end(void) { if (migration_bitmap) { @@ -1281,6 +1315,58 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) } } +/** + * If migration source determined we already have the chunk, it only + * sends a hash of the page's content. Read it from local storage, + * e.g., an old checkpoint. + * @param host Address which, after this function, should have a content matching the functions 2nd parameter. + * @param hash The hash value. + * @param size Size of the memory region in bytes. Typically, size is a single page, e.g., 4 KiB. + * @param fd file descriptor of checkpoint file + */ +void ram_handle_hash(void *host, uint64_t guest_phy_addr, uint8_t *hash, uint64_t size); +void ram_handle_hash(void *host, uint64_t guest_phy_addr, uint8_t *hash, uint64_t size) +{ + assert(fd_checkpoint != -1); + + /* fprintf(stdout, "ram_handle_hash: incoming has %u!\n", hash); */ + uint8_t local_page_hash[MD5_DIGEST_LENGTH]; + MD5(host, TARGET_PAGE_SIZE, local_page_hash); + + if (0 != memcmp(local_page_hash, hash, MD5_DIGEST_LENGTH)) { + /* Computed hash does not match the hash the migration source + sent us for this page. */ + hash_offset_entry* v = bsearch(hash, hash_offset_array, hash_offset_entries, + sizeof(hash_offset_entry), cmp_hash_offset_entry); + if (v == NULL) { + /* For some reason the source thought the destination + already has this block. But it doesn't. Hmmm ... */ + DPRINTF("ram_handle_hash: unknown hash %s at guest phy addr %08lx\n", md5s(hash), guest_phy_addr); + assert(0); + } + + DPRINTF("ram_handle_hash: guest_phy_addr=%08lx, hash=%s, offset=%08lx\n", guest_phy_addr, md5s(hash), v->offset); + + off_t offset_actual = lseek(fd_checkpoint, v->offset, SEEK_SET); + assert(offset_actual == v->offset); + + ssize_t read_actual = read(fd_checkpoint, host, TARGET_PAGE_SIZE); + assert(read_actual == TARGET_PAGE_SIZE); + MD5(host, TARGET_PAGE_SIZE, local_page_hash); + if (0 != memcmp(local_page_hash, hash, MD5_DIGEST_LENGTH)) { + DPRINTF("ram_handle_hash: local_page_hash=%s\n", md5s(local_page_hash)); + assert(0); + } + } +} + +static void add_remote_hash(ram_addr_t addr, uint8_t *hash) { + uint64_t page_nr = get_page_nr(addr); + memcpy(&hashes[page_nr * MD5_DIGEST_LENGTH], + hash, + MD5_DIGEST_LENGTH); +} + static int ram_load(QEMUFile *f, void *opaque, int version_id) { int flags = 0, ret = 0; @@ -1302,6 +1388,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) ram_addr_t addr, total_ram_bytes; void *host; uint8_t ch; + uint8_t hash[MD5_DIGEST_LENGTH]; addr = qemu_get_be64(f); flags = addr & ~TARGET_PAGE_MASK; @@ -1354,6 +1441,61 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) } ch = qemu_get_byte(f); ram_handle_compressed(host, ch, TARGET_PAGE_SIZE); + DPRINTF("ram_load: FLAG_COMPRESS, addr=%08lx ch=%u\n", addr, ch); + if (fd_checkpoint != -1) { + if (ch != 0) { + MD5(host, TARGET_PAGE_SIZE, hash); + add_remote_hash(addr, hash); + } else { + add_remote_hash(addr, all_zeroes_hash); + } + } + break; + case RAM_SAVE_FLAG_HASH: + host = host_from_stream_offset(f, addr, flags); + if (!host) { + error_report("Illegal RAM offset " RAM_ADDR_FMT, addr); + ret = -EINVAL; + break; + } + +#ifdef DEBUG_HASH + uint8_t src_page[TARGET_PAGE_SIZE]; + qemu_get_buffer(f, src_page, TARGET_PAGE_SIZE); +#endif + qemu_get_buffer(f, hash, MD5_DIGEST_LENGTH); + + ram_handle_hash(host, addr, hash, TARGET_PAGE_SIZE); + +#ifdef DEBUG_HASH + uint8_t local_hash[MD5_DIGEST_LENGTH]; + MD5(host, TARGET_PAGE_SIZE, local_hash); + assert(0 == memcmp(local_hash, hash, MD5_DIGEST_LENGTH)); + + uint8_t src_page_hash[MD5_DIGEST_LENGTH]; + MD5(src_page, TARGET_PAGE_SIZE, src_page_hash); + assert(0 == memcmp(src_page_hash, hash, MD5_DIGEST_LENGTH)); + assert(0 == memcmp(src_page, host, TARGET_PAGE_SIZE)); +#endif + assert(is_ram_addr(host)); + add_remote_hash(addr, hash); + DPRINTF("ram_load: FLAG_HASH, recv_hash=%s, addr=%08lx\n", md5s(hash), addr); + break; + case RAM_SAVE_FLAG_PAGE_HASH: + host = host_from_stream_offset(f, addr, flags); + if (!host) { + error_report("Illegal RAM offset " RAM_ADDR_FMT, addr); + ret = -EINVAL; + break; + } + + qemu_get_buffer(f, host, TARGET_PAGE_SIZE); + + qemu_get_buffer(f, hash, MD5_DIGEST_LENGTH); + + assert(is_ram_addr(host)); + add_remote_hash(addr, hash); + DPRINTF("ram_load: FLAG_PAGE_HASH, hash=%s, addr=%08lx\n", md5s(hash), addr); break; case RAM_SAVE_FLAG_PAGE: host = host_from_stream_offset(f, addr, flags); @@ -1363,6 +1505,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) break; } qemu_get_buffer(f, host, TARGET_PAGE_SIZE); + + if (is_ram_addr(host)) { + uint8_t hash[MD5_DIGEST_LENGTH]; + MD5(host, TARGET_PAGE_SIZE, hash); + add_remote_hash(addr, hash); + DPRINTF("ram_load: FLAG_PAGE, addr=%08lx, hash=%s\n", addr, md5s(hash)); + } break; case RAM_SAVE_FLAG_XBZRLE: host = host_from_stream_offset(f, addr, flags); -- 2.0.5 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration Bohdan Trach @ 2015-11-17 12:26 ` Dr. David Alan Gilbert 2015-11-17 15:38 ` Bohdan Trach 0 siblings, 1 reply; 23+ messages in thread From: Dr. David Alan Gilbert @ 2015-11-17 12:26 UTC (permalink / raw) To: Bohdan Trach; +Cc: Bohdan Trach, amit.shah, thomas.knauth, qemu-devel, quintela * Bohdan Trach (bv.trach@gmail.com) wrote: > From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > > Extend memory page saving and loading functions to utilize information > available in checkpoints to avoid sending full pages over the network. > > Signed-off-by: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> There are a couple of things I don't understand about this: 1) How does the source fill it's hashes table? Is it just given the same dump file as the destination? 2) Why does RAM_SAVE_FLAG_PAGE_HASH exist; if you're sending the full page to the destination, why do we also send the hash? > --- > arch_init.c | 167 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 158 insertions(+), 9 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index eda86d4..fca56f0 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -128,6 +128,8 @@ static uint64_t bitmap_sync_count; > #define RAM_SAVE_FLAG_CONTINUE 0x20 > #define RAM_SAVE_FLAG_XBZRLE 0x40 > /* 0x80 is reserved in migration.h start with 0x100 next */ > +#define RAM_SAVE_FLAG_HASH 0x100 > +#define RAM_SAVE_FLAG_PAGE_HASH 0x200 > > static struct defconfig_file { > const char *filename; > @@ -790,6 +792,7 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset, > uint8_t *p; > int ret; > bool send_async = true; > + uint8_t hash[MD5_DIGEST_LENGTH]; > > p = memory_region_get_ram_ptr(mr) + offset; > > @@ -841,16 +844,45 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset, > > /* XBZRLE overflow or normal page */ > if (pages == -1) { > - *bytes_transferred += save_page_header(f, block, > - offset | RAM_SAVE_FLAG_PAGE); > - if (send_async) { > - qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE); > - } else { > - qemu_put_buffer(f, p, TARGET_PAGE_SIZE); > + if (is_outgoing_with_checkpoint() && > + 0 == strncmp(block->mr->name, "pc.ram", strlen("pc.ram"))) { > + MD5(p, TARGET_PAGE_SIZE, hash); > + > + if (NULL != bsearch(hash, hashes, hashes_entries, > + MD5_DIGEST_LENGTH, uint128_compare)) { > + > + *bytes_transferred += save_page_header(f, block, offset | RAM_SAVE_FLAG_HASH); > +#ifdef DEBUG_HASH > + qemu_put_buffer(f, p, TARGET_PAGE_SIZE); > + *bytes_transferred += TARGET_PAGE_SIZE; > +#endif > + qemu_put_buffer(f, hash, MD5_DIGEST_LENGTH); > + *bytes_transferred += MD5_DIGEST_LENGTH; > + pages = 1; > + DPRINTF("ram_save_page: FLAG_HASH guest_phy_addr=%08lx flags=%lx hash=%s\n", offset&TARGET_PAGE_MASK, (offset | RAM_SAVE_FLAG_HASH)& ~TARGET_PAGE_MASK, md5s(hash)); > + } else { > + *bytes_transferred += save_page_header(f, block, offset | RAM_SAVE_FLAG_PAGE_HASH); > + qemu_put_buffer(f, p, TARGET_PAGE_SIZE); > + qemu_put_buffer(f, hash, MD5_DIGEST_LENGTH); I think there's a problem here that given the source is still running it's CPU and changing memory; it can be writing to the page at the same time, so the page you send might not match the hash you send; we're guaranteed to resend the page again if it was written to, but that still doesn't make these two things match; although as I say above I'm not sure why SAVE_FLAG_PAGE_HASH exists. > + *bytes_transferred += TARGET_PAGE_SIZE; > + *bytes_transferred += MD5_DIGEST_LENGTH; > + pages = 1; > + DPRINTF("ram_save_page: FLAG_PAGE_HASH guest_phy_addr=%08lx flags=%lx hash=%s\n", offset&TARGET_PAGE_MASK, (offset | RAM_SAVE_FLAG_PAGE_HASH)& ~TARGET_PAGE_MASK, md5s(hash)); > + } > + } > + if (pages == -1) { > + *bytes_transferred += save_page_header(f, block, > + offset | RAM_SAVE_FLAG_PAGE); > + if (send_async) { > + qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE); > + } else { > + qemu_put_buffer(f, p, TARGET_PAGE_SIZE); > + } > + *bytes_transferred += TARGET_PAGE_SIZE; > + pages = 1; > + acct_info.norm_pages++; > + DPRINTF("ram_save_page: FLAG_PAGE guest_phy_addr=%08lx flags=%lx", offset&TARGET_PAGE_MASK, (offset | RAM_SAVE_FLAG_PAGE)& ~TARGET_PAGE_MASK); > } > - *bytes_transferred += TARGET_PAGE_SIZE; > - pages = 1; > - acct_info.norm_pages++; > } > > XBZRLE_cache_unlock(); > @@ -963,6 +995,8 @@ void free_xbzrle_decoded_buf(void) > xbzrle_decoded_buf = NULL; > } > > +extern const char *checkpoint_path; > + > static void migration_end(void) > { > if (migration_bitmap) { > @@ -1281,6 +1315,58 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) > } > } > > +/** > + * If migration source determined we already have the chunk, it only > + * sends a hash of the page's content. Read it from local storage, > + * e.g., an old checkpoint. > + * @param host Address which, after this function, should have a content matching the functions 2nd parameter. > + * @param hash The hash value. > + * @param size Size of the memory region in bytes. Typically, size is a single page, e.g., 4 KiB. > + * @param fd file descriptor of checkpoint file > + */ > +void ram_handle_hash(void *host, uint64_t guest_phy_addr, uint8_t *hash, uint64_t size); > +void ram_handle_hash(void *host, uint64_t guest_phy_addr, uint8_t *hash, uint64_t size) > +{ > + assert(fd_checkpoint != -1); > + > + /* fprintf(stdout, "ram_handle_hash: incoming has %u!\n", hash); */ > + uint8_t local_page_hash[MD5_DIGEST_LENGTH]; > + MD5(host, TARGET_PAGE_SIZE, local_page_hash); > + > + if (0 != memcmp(local_page_hash, hash, MD5_DIGEST_LENGTH)) { > + /* Computed hash does not match the hash the migration source > + sent us for this page. */ > + hash_offset_entry* v = bsearch(hash, hash_offset_array, hash_offset_entries, > + sizeof(hash_offset_entry), cmp_hash_offset_entry); > + if (v == NULL) { > + /* For some reason the source thought the destination > + already has this block. But it doesn't. Hmmm ... */ > + DPRINTF("ram_handle_hash: unknown hash %s at guest phy addr %08lx\n", md5s(hash), guest_phy_addr); > + assert(0); > + } > + > + DPRINTF("ram_handle_hash: guest_phy_addr=%08lx, hash=%s, offset=%08lx\n", guest_phy_addr, md5s(hash), v->offset); > + > + off_t offset_actual = lseek(fd_checkpoint, v->offset, SEEK_SET); > + assert(offset_actual == v->offset); > + > + ssize_t read_actual = read(fd_checkpoint, host, TARGET_PAGE_SIZE); > + assert(read_actual == TARGET_PAGE_SIZE); > + MD5(host, TARGET_PAGE_SIZE, local_page_hash); > + if (0 != memcmp(local_page_hash, hash, MD5_DIGEST_LENGTH)) { > + DPRINTF("ram_handle_hash: local_page_hash=%s\n", md5s(local_page_hash)); > + assert(0); > + } > + } > +} > + > +static void add_remote_hash(ram_addr_t addr, uint8_t *hash) { > + uint64_t page_nr = get_page_nr(addr); > + memcpy(&hashes[page_nr * MD5_DIGEST_LENGTH], > + hash, > + MD5_DIGEST_LENGTH); > +} > + > static int ram_load(QEMUFile *f, void *opaque, int version_id) > { > int flags = 0, ret = 0; > @@ -1302,6 +1388,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) > ram_addr_t addr, total_ram_bytes; > void *host; > uint8_t ch; > + uint8_t hash[MD5_DIGEST_LENGTH]; > > addr = qemu_get_be64(f); > flags = addr & ~TARGET_PAGE_MASK; > @@ -1354,6 +1441,61 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) > } > ch = qemu_get_byte(f); > ram_handle_compressed(host, ch, TARGET_PAGE_SIZE); > + DPRINTF("ram_load: FLAG_COMPRESS, addr=%08lx ch=%u\n", addr, ch); Generally try and use trace_ rather than DPRINTF. > + if (fd_checkpoint != -1) { > + if (ch != 0) { > + MD5(host, TARGET_PAGE_SIZE, hash); > + add_remote_hash(addr, hash); > + } else { > + add_remote_hash(addr, all_zeroes_hash); > + } > + } > + break; > + case RAM_SAVE_FLAG_HASH: > + host = host_from_stream_offset(f, addr, flags); > + if (!host) { > + error_report("Illegal RAM offset " RAM_ADDR_FMT, addr); > + ret = -EINVAL; > + break; > + } > + > +#ifdef DEBUG_HASH > + uint8_t src_page[TARGET_PAGE_SIZE]; > + qemu_get_buffer(f, src_page, TARGET_PAGE_SIZE); > +#endif > + qemu_get_buffer(f, hash, MD5_DIGEST_LENGTH); > + > + ram_handle_hash(host, addr, hash, TARGET_PAGE_SIZE); > + > +#ifdef DEBUG_HASH > + uint8_t local_hash[MD5_DIGEST_LENGTH]; > + MD5(host, TARGET_PAGE_SIZE, local_hash); > + assert(0 == memcmp(local_hash, hash, MD5_DIGEST_LENGTH)); > + > + uint8_t src_page_hash[MD5_DIGEST_LENGTH]; > + MD5(src_page, TARGET_PAGE_SIZE, src_page_hash); > + assert(0 == memcmp(src_page_hash, hash, MD5_DIGEST_LENGTH)); > + assert(0 == memcmp(src_page, host, TARGET_PAGE_SIZE)); > +#endif > + assert(is_ram_addr(host)); > + add_remote_hash(addr, hash); > + DPRINTF("ram_load: FLAG_HASH, recv_hash=%s, addr=%08lx\n", md5s(hash), addr); > + break; > + case RAM_SAVE_FLAG_PAGE_HASH: > + host = host_from_stream_offset(f, addr, flags); > + if (!host) { > + error_report("Illegal RAM offset " RAM_ADDR_FMT, addr); > + ret = -EINVAL; > + break; > + } > + > + qemu_get_buffer(f, host, TARGET_PAGE_SIZE); > + > + qemu_get_buffer(f, hash, MD5_DIGEST_LENGTH); > + > + assert(is_ram_addr(host)); > + add_remote_hash(addr, hash); > + DPRINTF("ram_load: FLAG_PAGE_HASH, hash=%s, addr=%08lx\n", md5s(hash), addr); > break; > case RAM_SAVE_FLAG_PAGE: > host = host_from_stream_offset(f, addr, flags); > @@ -1363,6 +1505,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) > break; > } > qemu_get_buffer(f, host, TARGET_PAGE_SIZE); > + > + if (is_ram_addr(host)) { > + uint8_t hash[MD5_DIGEST_LENGTH]; > + MD5(host, TARGET_PAGE_SIZE, hash); > + add_remote_hash(addr, hash); > + DPRINTF("ram_load: FLAG_PAGE, addr=%08lx, hash=%s\n", addr, md5s(hash)); > + } > break; > case RAM_SAVE_FLAG_XBZRLE: > host = host_from_stream_offset(f, addr, flags); > -- > 2.0.5 Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration 2015-11-17 12:26 ` Dr. David Alan Gilbert @ 2015-11-17 15:38 ` Bohdan Trach 2015-11-17 16:05 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 23+ messages in thread From: Bohdan Trach @ 2015-11-17 15:38 UTC (permalink / raw) To: Dr. David Alan Gilbert; +Cc: amit.shah, thomas.knauth, qemu-devel, quintela On 11/17/2015 01:26 PM, Dr. David Alan Gilbert wrote: > There are a couple of things I don't understand about this: > 1) How does the source fill it's hashes table? Is it just given the same > dump file as the destination? > 2) Why does RAM_SAVE_FLAG_PAGE_HASH exist; if you're sending the full page > to the destination, why do we also send the hash? 1. Migration source is assumed to have the same dump file as the destination. The design was optimized for the case of ping-pong migrations over SAN, where checkpoint file is always available. We also have proof-of-concept code that transfers available hashes from the migration destination to the source over the network, but it didn't make it into these patches. 2. We send the hash to avoid hash calculations on the receiving side to save some CPU time. This flag can be removed, as I don't think the benefits it provides are big. > I think there's a problem here that given the source is still running it's CPU and changing > memory; it can be writing to the page at the same time, so the page you send might not > match the hash you send; we're guaranteed to resend the page again if it was written > to, but that still doesn't make these two things match; although as I say above > I'm not sure why SAVE_FLAG_PAGE_HASH exists. This is true. In this case, we will just delete the SAVE_FLAG_PAGE_HASH flag. > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- With best regards, Bohdan Trach ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration 2015-11-17 15:38 ` Bohdan Trach @ 2015-11-17 16:05 ` Dr. David Alan Gilbert 2015-11-17 16:34 ` Bohdan Trach 0 siblings, 1 reply; 23+ messages in thread From: Dr. David Alan Gilbert @ 2015-11-17 16:05 UTC (permalink / raw) To: Bohdan Trach; +Cc: amit.shah, thomas.knauth, qemu-devel, quintela * Bohdan Trach (bohdan.trach@mailbox.tu-dresden.de) wrote: > > On 11/17/2015 01:26 PM, Dr. David Alan Gilbert wrote: > > There are a couple of things I don't understand about this: > > 1) How does the source fill it's hashes table? Is it just given the same > > dump file as the destination? > > 2) Why does RAM_SAVE_FLAG_PAGE_HASH exist; if you're sending the full page > > to the destination, why do we also send the hash? > > 1. Migration source is assumed to have the same dump file as the > destination. The design was optimized for the case of ping-pong > migrations over SAN, where checkpoint file is always available. We > also have proof-of-concept code that transfers available hashes from > the migration destination to the source over the network, but it > didn't make it into these patches. OK, it's easy with the SAN then. > 2. We send the hash to avoid hash calculations on the receiving side > to save some CPU time. This flag can be removed, as I don't think the > benefits it provides are big. Why is the hash needed on the destination; if it's a page which the source has decided isn't in a matching page, what does the destination use the hash for? > > I think there's a problem here that given the source is still running it's CPU and changing > > memory; it can be writing to the page at the same time, so the page you send might not > > match the hash you send; we're guaranteed to resend the page again if it was written > > to, but that still doesn't make these two things match; although as I say above > > I'm not sure why SAVE_FLAG_PAGE_HASH exists. > > This is true. In this case, we will just delete the SAVE_FLAG_PAGE_HASH flag. But how do you know to delete the SAVE_FLAG_PAGE_HASH flag? Dave > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > -- > With best regards, > Bohdan Trach -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration 2015-11-17 16:05 ` Dr. David Alan Gilbert @ 2015-11-17 16:34 ` Bohdan Trach 2015-11-17 16:39 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 23+ messages in thread From: Bohdan Trach @ 2015-11-17 16:34 UTC (permalink / raw) To: Dr. David Alan Gilbert; +Cc: amit.shah, thomas.knauth, qemu-devel, quintela On 11/17/2015 05:05 PM, Dr. David Alan Gilbert wrote: > Why is the hash needed on the destination; if it's a page which the source > has decided isn't in a matching page, what does the destination use the > hash for? > After the migration has finished, the hashes are still stored in RAM for the next migration, when the current destination becomes the new migration source. This way there is no need to recompute the checksums on the next migration -- they are already in RAM. >>> I think there's a problem here that given the source is still running it's CPU and changing >>> memory; it can be writing to the page at the same time, so the page you send might not >>> match the hash you send; we're guaranteed to resend the page again if it was written >>> to, but that still doesn't make these two things match; although as I say above >>> I'm not sure why SAVE_FLAG_PAGE_HASH exists. >> >> This is true. In this case, we will just delete the SAVE_FLAG_PAGE_HASH flag. > > But how do you know to delete the SAVE_FLAG_PAGE_HASH flag? > Sorry for not stating this clear enough. We will remove this flag from the code, and send pages with SAVE_FLAG_PAGE instead. In this case the destination will compute the hash. > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- With best regards, Bohdan Trach ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration 2015-11-17 16:34 ` Bohdan Trach @ 2015-11-17 16:39 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 23+ messages in thread From: Dr. David Alan Gilbert @ 2015-11-17 16:39 UTC (permalink / raw) To: Bohdan Trach; +Cc: amit.shah, thomas.knauth, qemu-devel, quintela * Bohdan Trach (bohdan.trach@mailbox.tu-dresden.de) wrote: > > On 11/17/2015 05:05 PM, Dr. David Alan Gilbert wrote: > > Why is the hash needed on the destination; if it's a page which the source > > has decided isn't in a matching page, what does the destination use the > > hash for? > > > > After the migration has finished, the hashes are still stored in RAM > for the next migration, when the current destination becomes the new > migration source. This way there is no need to recompute the checksums > on the next migration -- they are already in RAM. > > >>> I think there's a problem here that given the source is still running it's CPU and changing > >>> memory; it can be writing to the page at the same time, so the page you send might not > >>> match the hash you send; we're guaranteed to resend the page again if it was written > >>> to, but that still doesn't make these two things match; although as I say above > >>> I'm not sure why SAVE_FLAG_PAGE_HASH exists. > >> > >> This is true. In this case, we will just delete the SAVE_FLAG_PAGE_HASH flag. > > > > But how do you know to delete the SAVE_FLAG_PAGE_HASH flag? > > > > Sorry for not stating this clear enough. We will remove this flag from > the code, and send pages with SAVE_FLAG_PAGE instead. In this case the > destination will compute the hash. OK, that's fine. Dave > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > -- > With best regards, > Bohdan Trach -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC, Ping 0/3] Checkpoint-assisted migration proposal 2015-04-17 12:12 [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal Bohdan Trach ` (2 preceding siblings ...) 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration Bohdan Trach @ 2015-04-24 11:38 ` Bohdan Trach 2015-05-11 11:13 ` Amit Shah 2015-09-15 10:39 ` [Qemu-devel] [PATCH RFC " Amit Shah 4 siblings, 1 reply; 23+ messages in thread From: Bohdan Trach @ 2015-04-24 11:38 UTC (permalink / raw) To: Bohdan Trach, qemu-devel; +Cc: amit.shah, thomas.knauth, quintela Ping. The patches are: http://patchwork.ozlabs.org/patch/462043/ http://patchwork.ozlabs.org/patch/462040/ http://patchwork.ozlabs.org/patch/462045/ Description: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01555.html https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg02014.html Feedback is most welcome on the following aspects: 1. Code related to migration (ram_save_page, ram_load). 2. Hashing function choice (adding OpenSSL as a dependency just for one function is probably not the best option). 3. Overall code organization. Thank You! On 04/17/2015 02:12 PM, Bohdan Trach wrote: > From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > > This patchset contains a checkpoint-assisted migration feature as > proposed earlier on this list [1]. It allows reusing existing memory > snapshots of guests to speed up migration of VMs between physical > hosts. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01555.html > > Bohdan Trach (3): > memory: Add dump-pc-mem command for checkpointing > memory: implement checkpoint handling > migration: use checkpoint during migration -- With best regards, Bohdan Trach ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC, Ping 0/3] Checkpoint-assisted migration proposal 2015-04-24 11:38 ` [Qemu-devel] [PATCH RFC, Ping 0/3] Checkpoint-assisted migration proposal Bohdan Trach @ 2015-05-11 11:13 ` Amit Shah 2015-06-09 10:00 ` Bohdan Trach 0 siblings, 1 reply; 23+ messages in thread From: Amit Shah @ 2015-05-11 11:13 UTC (permalink / raw) To: Bohdan Trach; +Cc: quintela, thomas.knauth, Bohdan Trach, qemu-devel On (Fri) 24 Apr 2015 [13:38:54], Bohdan Trach wrote: > Ping. It's taking a while, just because there are some other patches in the queue. I'll get to this soon. Thanks for your patience. Amit ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC, Ping 0/3] Checkpoint-assisted migration proposal 2015-05-11 11:13 ` Amit Shah @ 2015-06-09 10:00 ` Bohdan Trach 2015-08-19 9:19 ` Bohdan Trach 0 siblings, 1 reply; 23+ messages in thread From: Bohdan Trach @ 2015-06-09 10:00 UTC (permalink / raw) To: Amit Shah; +Cc: thomas.knauth, qemu-devel, quintela Ping. The patches are: http://patchwork.ozlabs.org/patch/462043/ http://patchwork.ozlabs.org/patch/462040/ http://patchwork.ozlabs.org/patch/462045/ Description: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01555.html https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg02014.html Also, this code is being extended to also optionally use deduplication and dirty page tracking to save even more bandwidth. -- With best regards, Bohdan Trach On 05/11/2015 01:13 PM, Amit Shah wrote: > On (Fri) 24 Apr 2015 [13:38:54], Bohdan Trach wrote: >> Ping. > > It's taking a while, just because there are some other patches in the > queue. I'll get to this soon. > > Thanks for your patience. > > Amit > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC, Ping 0/3] Checkpoint-assisted migration proposal 2015-06-09 10:00 ` Bohdan Trach @ 2015-08-19 9:19 ` Bohdan Trach 0 siblings, 0 replies; 23+ messages in thread From: Bohdan Trach @ 2015-08-19 9:19 UTC (permalink / raw) To: qemu-devel; +Cc: Amit Shah, thomas.knauth, quintela One more ping. Clearly, this patch set now requires porting to the latest QEMU, but before doing that, I would like to know if there is any interest at all in merging this feature. The patches are: http://patchwork.ozlabs.org/patch/462043/ http://patchwork.ozlabs.org/patch/462040/ http://patchwork.ozlabs.org/patch/462045/ Description: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01555.html https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg02014.html -- With best regards, Bohdan Trach On 06/09/2015 12:00 PM, Bohdan Trach wrote: > Ping. > > The patches are: > http://patchwork.ozlabs.org/patch/462043/ > http://patchwork.ozlabs.org/patch/462040/ > http://patchwork.ozlabs.org/patch/462045/ > > Description: > https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01555.html > https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg02014.html > > Also, this code is being extended to also optionally use deduplication and dirty > page tracking to save even more bandwidth. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal 2015-04-17 12:12 [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal Bohdan Trach ` (3 preceding siblings ...) 2015-04-24 11:38 ` [Qemu-devel] [PATCH RFC, Ping 0/3] Checkpoint-assisted migration proposal Bohdan Trach @ 2015-09-15 10:39 ` Amit Shah 2015-10-05 8:33 ` Thomas Knauth 4 siblings, 1 reply; 23+ messages in thread From: Amit Shah @ 2015-09-15 10:39 UTC (permalink / raw) To: Bohdan Trach; +Cc: Bohdan Trach, thomas.knauth, qemu-devel, quintela Hi, On (Fri) 17 Apr 2015 [14:12:59], Bohdan Trach wrote: > From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de> > > This patchset contains a checkpoint-assisted migration feature as > proposed earlier on this list [1]. It allows reusing existing memory > snapshots of guests to speed up migration of VMs between physical > hosts. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01555.html Could you please include a file in the docs/ directory that documents how this works, so it's easier to comment on the general idea? >From 'checkpointing', I was afraid this was going to use some checkpoint-restore framework, but instead it's a new checkpointing method that you're adding to qemu. Can you describe when checkpoints are taken, and what is checkpointed? How is it stored on the disk? I'm sure the patches have all the details, but it's easier to check the soundness of the idea if there's a high-level doc that explains this, and then we can discuss the finer points over patches. Overall, I think this approach can benefit some workloads, and since it's not affecting a lot of common code, we could look at adding it. Also, apologies for not getting to this earlier. Thanks, Amit ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal 2015-09-15 10:39 ` [Qemu-devel] [PATCH RFC " Amit Shah @ 2015-10-05 8:33 ` Thomas Knauth 2015-10-05 8:59 ` Amit Shah 0 siblings, 1 reply; 23+ messages in thread From: Thomas Knauth @ 2015-10-05 8:33 UTC (permalink / raw) To: Amit Shah; +Cc: Bohdan Trach, quintela, Bohdan Trach, qemu-devel Hi Amit, On Tue, Sep 15, 2015 at 12:39 PM, Amit Shah <amit.shah@redhat.com> wrote: > Could you please include a file in the docs/ directory that documents > how this works, so it's easier to comment on the general idea? sure, we will add this. > From 'checkpointing', I was afraid this was going to use some > checkpoint-restore framework, but instead it's a new checkpointing > method that you're adding to qemu. > > Can you describe when checkpoints are taken, and what is checkpointed? > How is it stored on the disk? Checkpoints are taken after a migration (at the source). If a checkpoint exists at the destination, the VM's state is reconstructed from the local checkpoint as well as updated pages from the source. This checkpoint-assisted migration can be faster, if network is the bottleneck, and saves network bandwidth. We can, in principle, reuse the existing checkpoint format of QEMU. The current implementation writes its own checkpoint because it was less effort on our side. We write the VM's main memory into a single file. > I'm sure the patches have all the details, but it's easier to check > the soundness of the idea if there's a high-level doc that explains > this, and then we can discuss the finer points over patches. We've recently published a paper about the general idea and expected benefits for a number of workloads ( http://se.inf.tu-dresden.de/pubs/papers/knauth2015vecycle.pdf ) > Overall, I think this approach can benefit some workloads, and since > it's not affecting a lot of common code, we could look at adding it. > > Also, apologies for not getting to this earlier. Kind regards, Thomas. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal 2015-10-05 8:33 ` Thomas Knauth @ 2015-10-05 8:59 ` Amit Shah 0 siblings, 0 replies; 23+ messages in thread From: Amit Shah @ 2015-10-05 8:59 UTC (permalink / raw) To: Thomas Knauth; +Cc: Bohdan Trach, quintela, Bohdan Trach, qemu-devel On (Mon) 05 Oct 2015 [10:33:01], Thomas Knauth wrote: > Hi Amit, > > On Tue, Sep 15, 2015 at 12:39 PM, Amit Shah <amit.shah@redhat.com> wrote: > > Could you please include a file in the docs/ directory that documents > > how this works, so it's easier to comment on the general idea? > > sure, we will add this. Thanks! > > From 'checkpointing', I was afraid this was going to use some > > checkpoint-restore framework, but instead it's a new checkpointing > > method that you're adding to qemu. > > > > Can you describe when checkpoints are taken, and what is checkpointed? > > How is it stored on the disk? > > Checkpoints are taken after a migration (at the source). If a > checkpoint exists at the destination, the VM's state is reconstructed > from the local checkpoint as well as updated pages from the source. > This checkpoint-assisted migration can be faster, if network is the > bottleneck, and saves network bandwidth. > > We can, in principle, reuse the existing checkpoint format of QEMU. > The current implementation writes its own checkpoint because it was > less effort on our side. We write the VM's main memory into a single > file. > > > I'm sure the patches have all the details, but it's easier to check > > the soundness of the idea if there's a high-level doc that explains > > this, and then we can discuss the finer points over patches. > > We've recently published a paper about the general idea and expected > benefits for a number of workloads ( > http://se.inf.tu-dresden.de/pubs/papers/knauth2015vecycle.pdf ) I'll give it a look, thanks. I'm interested in knowing what workloads benefit. There was one outstanding question from Dave about collisions: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01614.html Can you please address that in your next submission? > > Overall, I think this approach can benefit some workloads, and since > > it's not affecting a lot of common code, we could look at adding it. > > > > Also, apologies for not getting to this earlier. > > Kind regards, > Thomas. Thanks, Amit ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2015-11-17 16:40 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-04-17 12:12 [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal Bohdan Trach 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing Bohdan Trach 2015-04-17 13:53 ` Eric Blake 2015-04-18 7:40 ` Bohdan Trach 2015-11-16 16:46 ` Dr. David Alan Gilbert 2015-11-17 15:38 ` Bohdan Trach 2015-11-17 16:02 ` Dr. David Alan Gilbert 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 2/3] memory: implement checkpoint handling Bohdan Trach 2015-11-16 16:56 ` Dr. David Alan Gilbert 2015-11-17 15:38 ` Bohdan Trach 2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration Bohdan Trach 2015-11-17 12:26 ` Dr. David Alan Gilbert 2015-11-17 15:38 ` Bohdan Trach 2015-11-17 16:05 ` Dr. David Alan Gilbert 2015-11-17 16:34 ` Bohdan Trach 2015-11-17 16:39 ` Dr. David Alan Gilbert 2015-04-24 11:38 ` [Qemu-devel] [PATCH RFC, Ping 0/3] Checkpoint-assisted migration proposal Bohdan Trach 2015-05-11 11:13 ` Amit Shah 2015-06-09 10:00 ` Bohdan Trach 2015-08-19 9:19 ` Bohdan Trach 2015-09-15 10:39 ` [Qemu-devel] [PATCH RFC " Amit Shah 2015-10-05 8:33 ` Thomas Knauth 2015-10-05 8:59 ` Amit Shah
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).