Re: [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Bohdan Trach <bv.trach@gmail.com>
Cc: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de>,
	amit.shah@redhat.com, thomas.knauth@googlemail.com,
	qemu-devel@nongnu.org, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration
Date: Tue, 17 Nov 2015 12:26:02 +0000	[thread overview]
Message-ID: <20151117122601.GE2498@work-vm> (raw)
In-Reply-To: <6a726a501c5dcead1d19f42a43a428baae63ccd7.1429272036.git.bv.trach@gmail.com>

* Bohdan Trach (bv.trach@gmail.com) wrote:
> From: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de>
> 
> Extend memory page saving and loading functions to utilize information
> available in checkpoints to avoid sending full pages over the network.
> 
> Signed-off-by: Bohdan Trach <bohdan.trach@mailbox.tu-dresden.de>

There are a couple of things I don't understand about this:
  1) How does the source fill it's hashes table?  Is it just given the same
     dump file as the destination?
  2) Why does RAM_SAVE_FLAG_PAGE_HASH exist; if you're sending the full page
     to the destination, why do we also send the hash?

> ---
>  arch_init.c | 167 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 158 insertions(+), 9 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index eda86d4..fca56f0 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -128,6 +128,8 @@ static uint64_t bitmap_sync_count;
>  #define RAM_SAVE_FLAG_CONTINUE 0x20
>  #define RAM_SAVE_FLAG_XBZRLE   0x40
>  /* 0x80 is reserved in migration.h start with 0x100 next */
> +#define RAM_SAVE_FLAG_HASH     0x100
> +#define RAM_SAVE_FLAG_PAGE_HASH     0x200
>  
>  static struct defconfig_file {
>      const char *filename;
> @@ -790,6 +792,7 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
>      uint8_t *p;
>      int ret;
>      bool send_async = true;
> +    uint8_t hash[MD5_DIGEST_LENGTH];
>  
>      p = memory_region_get_ram_ptr(mr) + offset;
>  
> @@ -841,16 +844,45 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
>  
>      /* XBZRLE overflow or normal page */
>      if (pages == -1) {
> -        *bytes_transferred += save_page_header(f, block,
> -                                               offset | RAM_SAVE_FLAG_PAGE);
> -        if (send_async) {
> -            qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE);
> -        } else {
> -            qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> +        if (is_outgoing_with_checkpoint() &&
> +            0 == strncmp(block->mr->name, "pc.ram", strlen("pc.ram"))) {
> +            MD5(p, TARGET_PAGE_SIZE, hash);
> +
> +            if (NULL != bsearch(hash, hashes, hashes_entries,
> +                                MD5_DIGEST_LENGTH, uint128_compare)) {
> +
> +                *bytes_transferred += save_page_header(f, block, offset | RAM_SAVE_FLAG_HASH);
> +#ifdef DEBUG_HASH
> +                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> +                *bytes_transferred += TARGET_PAGE_SIZE;
> +#endif
> +                qemu_put_buffer(f, hash, MD5_DIGEST_LENGTH);
> +                *bytes_transferred += MD5_DIGEST_LENGTH;
> +                pages = 1;
> +                DPRINTF("ram_save_page: FLAG_HASH guest_phy_addr=%08lx flags=%lx hash=%s\n", offset&TARGET_PAGE_MASK, (offset | RAM_SAVE_FLAG_HASH)& ~TARGET_PAGE_MASK, md5s(hash));
> +            } else {
> +                *bytes_transferred += save_page_header(f, block, offset | RAM_SAVE_FLAG_PAGE_HASH);
> +                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> +                qemu_put_buffer(f, hash, MD5_DIGEST_LENGTH);

I think there's a problem here that given the source is still running it's CPU and changing
memory; it can be writing to the page at the same time, so the page you send might not
match the hash you send;  we're guaranteed to resend the page again if it was written
to, but that still doesn't make these two things match; although as I say above
I'm not sure why SAVE_FLAG_PAGE_HASH exists.

> +                *bytes_transferred += TARGET_PAGE_SIZE;
> +                *bytes_transferred += MD5_DIGEST_LENGTH;
> +                pages = 1;
> +                DPRINTF("ram_save_page: FLAG_PAGE_HASH guest_phy_addr=%08lx flags=%lx hash=%s\n", offset&TARGET_PAGE_MASK, (offset | RAM_SAVE_FLAG_PAGE_HASH)& ~TARGET_PAGE_MASK, md5s(hash));
> +            }
> +        }
> +        if (pages == -1) {
> +            *bytes_transferred += save_page_header(f, block,
> +                                                   offset | RAM_SAVE_FLAG_PAGE);
> +            if (send_async) {
> +                qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE);
> +            } else {
> +                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> +            }
> +            *bytes_transferred += TARGET_PAGE_SIZE;
> +            pages = 1;
> +            acct_info.norm_pages++;
> +            DPRINTF("ram_save_page: FLAG_PAGE guest_phy_addr=%08lx flags=%lx", offset&TARGET_PAGE_MASK, (offset | RAM_SAVE_FLAG_PAGE)& ~TARGET_PAGE_MASK);
>          }
> -        *bytes_transferred += TARGET_PAGE_SIZE;
> -        pages = 1;
> -        acct_info.norm_pages++;
>      }
>  
>      XBZRLE_cache_unlock();
> @@ -963,6 +995,8 @@ void free_xbzrle_decoded_buf(void)
>      xbzrle_decoded_buf = NULL;
>  }
>  
> +extern const char *checkpoint_path;
> +
>  static void migration_end(void)
>  {
>      if (migration_bitmap) {
> @@ -1281,6 +1315,58 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
>      }
>  }
>  
> +/**
> + * If migration source determined we already have the chunk, it only
> + * sends a hash of the page's content. Read it from local storage,
> + * e.g., an old checkpoint.
> + * @param host Address which, after this function, should have a content matching the functions 2nd parameter.
> + * @param hash The hash value.
> + * @param size Size of the memory region in bytes. Typically, size is a single page, e.g., 4 KiB.
> + * @param fd file descriptor of checkpoint file
> + */
> +void ram_handle_hash(void *host, uint64_t guest_phy_addr, uint8_t *hash, uint64_t size);
> +void ram_handle_hash(void *host, uint64_t guest_phy_addr, uint8_t *hash, uint64_t size)
> +{
> +    assert(fd_checkpoint != -1);
> +
> +    /* fprintf(stdout, "ram_handle_hash: incoming has %u!\n", hash); */
> +    uint8_t local_page_hash[MD5_DIGEST_LENGTH];
> +    MD5(host, TARGET_PAGE_SIZE, local_page_hash);
> +
> +    if (0 != memcmp(local_page_hash, hash, MD5_DIGEST_LENGTH)) {
> +        /* Computed hash does not match the hash the migration source
> +           sent us for this page. */
> +        hash_offset_entry* v = bsearch(hash, hash_offset_array, hash_offset_entries,
> +                                       sizeof(hash_offset_entry), cmp_hash_offset_entry);
> +        if (v == NULL) {
> +            /* For some reason the source thought the destination
> +               already has this block. But it doesn't. Hmmm ... */
> +            DPRINTF("ram_handle_hash: unknown hash %s at guest phy addr %08lx\n", md5s(hash), guest_phy_addr);
> +            assert(0);
> +        }
> +
> +        DPRINTF("ram_handle_hash: guest_phy_addr=%08lx, hash=%s, offset=%08lx\n", guest_phy_addr, md5s(hash), v->offset);
> +
> +        off_t offset_actual = lseek(fd_checkpoint, v->offset, SEEK_SET);
> +        assert(offset_actual == v->offset);
> +
> +        ssize_t read_actual = read(fd_checkpoint, host, TARGET_PAGE_SIZE);
> +        assert(read_actual == TARGET_PAGE_SIZE);
> +        MD5(host, TARGET_PAGE_SIZE, local_page_hash);
> +        if (0 != memcmp(local_page_hash, hash, MD5_DIGEST_LENGTH)) {
> +            DPRINTF("ram_handle_hash: local_page_hash=%s\n", md5s(local_page_hash));
> +            assert(0);
> +        }
> +    }
> +}
> +
> +static void add_remote_hash(ram_addr_t addr, uint8_t *hash) {
> +    uint64_t page_nr = get_page_nr(addr);
> +    memcpy(&hashes[page_nr * MD5_DIGEST_LENGTH],
> +           hash,
> +           MD5_DIGEST_LENGTH);
> +}
> +
>  static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  {
>      int flags = 0, ret = 0;
> @@ -1302,6 +1388,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>          ram_addr_t addr, total_ram_bytes;
>          void *host;
>          uint8_t ch;
> +        uint8_t hash[MD5_DIGEST_LENGTH];
>  
>          addr = qemu_get_be64(f);
>          flags = addr & ~TARGET_PAGE_MASK;
> @@ -1354,6 +1441,61 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              }
>              ch = qemu_get_byte(f);
>              ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> +            DPRINTF("ram_load: FLAG_COMPRESS, addr=%08lx ch=%u\n", addr, ch);

Generally try and use trace_ rather than DPRINTF.

> +            if (fd_checkpoint != -1) {
> +                if (ch != 0) {
> +                    MD5(host, TARGET_PAGE_SIZE, hash);
> +                    add_remote_hash(addr, hash);
> +                } else {
> +                    add_remote_hash(addr, all_zeroes_hash);
> +                }
> +            }
> +            break;
> +        case RAM_SAVE_FLAG_HASH:
> +            host = host_from_stream_offset(f, addr, flags);
> +            if (!host) {
> +                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> +                ret = -EINVAL;
> +                break;
> +            }
> +
> +#ifdef DEBUG_HASH
> +            uint8_t src_page[TARGET_PAGE_SIZE];
> +            qemu_get_buffer(f, src_page, TARGET_PAGE_SIZE);
> +#endif
> +            qemu_get_buffer(f, hash, MD5_DIGEST_LENGTH);
> +
> +            ram_handle_hash(host, addr, hash, TARGET_PAGE_SIZE);
> +
> +#ifdef DEBUG_HASH
> +            uint8_t local_hash[MD5_DIGEST_LENGTH];
> +            MD5(host, TARGET_PAGE_SIZE, local_hash);
> +            assert(0 == memcmp(local_hash, hash, MD5_DIGEST_LENGTH));
> +
> +            uint8_t src_page_hash[MD5_DIGEST_LENGTH];
> +            MD5(src_page, TARGET_PAGE_SIZE, src_page_hash);
> +            assert(0 == memcmp(src_page_hash, hash, MD5_DIGEST_LENGTH));
> +            assert(0 == memcmp(src_page, host, TARGET_PAGE_SIZE));
> +#endif
> +            assert(is_ram_addr(host));
> +            add_remote_hash(addr, hash);
> +            DPRINTF("ram_load: FLAG_HASH, recv_hash=%s, addr=%08lx\n", md5s(hash), addr);
> +            break;
> +        case RAM_SAVE_FLAG_PAGE_HASH:
> +            host = host_from_stream_offset(f, addr, flags);
> +            if (!host) {
> +                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> +                ret = -EINVAL;
> +                break;
> +            }
> +
> +            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> +
> +            qemu_get_buffer(f, hash, MD5_DIGEST_LENGTH);
> +
> +            assert(is_ram_addr(host));
> +            add_remote_hash(addr, hash);
> +            DPRINTF("ram_load: FLAG_PAGE_HASH, hash=%s, addr=%08lx\n", md5s(hash), addr);
>              break;
>          case RAM_SAVE_FLAG_PAGE:
>              host = host_from_stream_offset(f, addr, flags);
> @@ -1363,6 +1505,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  break;
>              }
>              qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> +
> +            if (is_ram_addr(host)) {
> +                uint8_t hash[MD5_DIGEST_LENGTH];
> +                MD5(host, TARGET_PAGE_SIZE, hash);
> +                add_remote_hash(addr, hash);
> +                DPRINTF("ram_load: FLAG_PAGE, addr=%08lx, hash=%s\n", addr, md5s(hash));
> +            }
>              break;
>          case RAM_SAVE_FLAG_XBZRLE:
>              host = host_from_stream_offset(f, addr, flags);
> -- 
> 2.0.5

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2015-11-17 12:26 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-17 12:12 [Qemu-devel] [PATCH RFC 0/3] Checkpoint-assisted migration proposal Bohdan Trach
2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 1/3] memory: Add dump-pc-mem command for checkpointing Bohdan Trach
2015-04-17 13:53   ` Eric Blake
2015-04-18  7:40     ` Bohdan Trach
2015-11-16 16:46   ` Dr. David Alan Gilbert
2015-11-17 15:38     ` Bohdan Trach
2015-11-17 16:02       ` Dr. David Alan Gilbert
2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 2/3] memory: implement checkpoint handling Bohdan Trach
2015-11-16 16:56   ` Dr. David Alan Gilbert
2015-11-17 15:38     ` Bohdan Trach
2015-04-17 12:13 ` [Qemu-devel] [PATCH RFC 3/3] migration: use checkpoint during migration Bohdan Trach
2015-11-17 12:26   ` Dr. David Alan Gilbert [this message]
2015-11-17 15:38     ` Bohdan Trach
2015-11-17 16:05       ` Dr. David Alan Gilbert
2015-11-17 16:34         ` Bohdan Trach
2015-11-17 16:39           ` Dr. David Alan Gilbert
2015-04-24 11:38 ` [Qemu-devel] [PATCH RFC, Ping 0/3] Checkpoint-assisted migration proposal Bohdan Trach
2015-05-11 11:13   ` Amit Shah
2015-06-09 10:00     ` Bohdan Trach
2015-08-19  9:19       ` Bohdan Trach
2015-09-15 10:39 ` [Qemu-devel] [PATCH RFC " Amit Shah
2015-10-05  8:33   ` Thomas Knauth
2015-10-05  8:59     ` Amit Shah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151117122601.GE2498@work-vm \
    --to=dgilbert@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=bohdan.trach@mailbox.tu-dresden.de \
    --cc=bv.trach@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thomas.knauth@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).