Re: [Qemu-devel] [PATCH v2] XBRLE page delta compression for live migration of large memory apps

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@gmail.com>
To: "Shribman, Aidan" <aidan.shribman@sap.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH v2] XBRLE page delta compression for live migration of large memory apps
Date: Thu, 7 Jul 2011 09:23:24 +0100	[thread overview]
Message-ID: <20110707082324.GA14184@stefanha-thinkpad.localdomain> (raw)
In-Reply-To: <AB5A8C7661872E428D6B8E1C2DFA35085D80556BB8@DEWDFECCR02.wdf.sap.corp>

On Wed, Jul 06, 2011 at 02:01:58PM +0200, Shribman, Aidan wrote:
> XBRLE does not suite all scenarios but has been proven highly beneficial in
> for VMs running payloads such as: SAP ERP systems; VLC transcoding; LMbench
> memory write benchmarks.

Another way of looking at this patch is as a profiling tool for identifying
workloads with poor TLB and cache behavior.  The workloads that benefit from
this patch are dirtying memory in a sparse fashion, touching many pages but
making only small changes.  These workloads are using data structures and
algorithms that are simply not TLB/cache-efficient.

Instead of spending effort in live migration compensating for this poor
workload behavior, why not fix the workload?  The benefits are much greater:
poor TLB/cache usage affects the workload all the time, not just during live
migration and not just under virtualization.  By fixing the workload you will
also get faster live migration.

That said, if we can improve live migration then this is a good thing.  The
challenge is that this optimization is speculative.  You need to do a lot of
work up-front: copying all pages through the cache and hoping their xor/rle
representation will be <1/3 TARGET_PAGE_SIZE.  If there is a cache miss or the
xor/rle representation is not small enough then it's back to square one.

Any thoughts on reducing the overhead and making xbrle on by default?

> Work is based on research results published VEE 2011: Evaluation of Delta
> Compression Techniques for Efficient Live Migration of Large Virtual Machines
> by Benoit, Svard, Tordsson and Elmroth.

I will read your paper.  Did you try unconditionally applying a cheap
compression algorithm like the one Google recently published?  That way you
just compress everything and don't need to keep the cache around:

http://code.google.com/p/snappy/
http://www.hypertable.org/doxygen/bmz_8h.html

> +static int save_xbrle_page(QEMUFile *f, uint8_t *current_data,
> +        ram_addr_t current_addr, RAMBlock *block, ram_addr_t offset, int cont)
> +{
> +    int cache_location = -1, slot = -1, encoded_len = 0, bytes_sent = 0;
> +    XBRLEHeader hdr = {0};
> +    CacheItem *it;
> +    uint8_t *xor_buf = NULL, *xbrle_buf = NULL;
> +
> +    /* get location */
> +    slot = cache_is_cached(current_addr);
> +    if (slot == -1) {
> +        acct_info.xbrle_cache_miss++;
> +        goto done;
> +    }
> +    cache_location = cache_get_cache_pos(current_addr);
> +
> +    /* abort if page changed too much */
> +    it = cache_item_get(cache_location, slot);
> +
> +    /* XOR encoding */
> +    xor_buf = (uint8_t *) qemu_mallocz(TARGET_PAGE_SIZE);

Zeroing unnecessary here.

> +    xor_encode(xor_buf, it->it_data, current_data);
> +
> +    /* XBRLE (XOR+RLE) encoding (if we can ensure a 1/3 ratio) */
> +    xbrle_buf = (uint8_t *) qemu_mallocz(TARGET_PAGE_SIZE);

Why TARGET_PAGE_SIZE when the actual size is TARGET_PAGE_SIZE/3?

Zeroing unnecessary here.

> +    encoded_len = rle_encode(xor_buf, TARGET_PAGE_SIZE, xbrle_buf,
> +            TARGET_PAGE_SIZE/3);
> +
> +    if (encoded_len < 0) {
> +        DPRINTF("XBRLE encoding oeverflow - sending uncompressed\n");

s/oeverflow/overflow/

> +        acct_info.xbrle_overflow++;
> +        goto done;
> +    }
> +
> +    hdr.xh_len = encoded_len;
> +    hdr.xh_flags |= ENCODING_FLAG_XBRLE;
> +
> +    /* Send XBRLE compressed page */
> +    save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_XBRLE);
> +    qemu_put_buffer(f, (uint8_t *) &hdr, sizeof(hdr));
> +    qemu_put_buffer(f, xbrle_buf, encoded_len);
> +    acct_info.xbrle_pages++;
> +    bytes_sent = encoded_len + sizeof(hdr);
> +    acct_info.xbrle_bytes += bytes_sent;
> +
> +done:
> +    qemu_free(xor_buf);
> +    qemu_free(xbrle_buf);
> +    return bytes_sent;
> +}
> 
>  static int is_dup_page(uint8_t *page, uint8_t ch)
>  {
> @@ -107,7 +486,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
>  static RAMBlock *last_block;
>  static ram_addr_t last_offset;
> 
> -static int ram_save_block(QEMUFile *f)
> +static int ram_save_block(QEMUFile *f, int stage)
>  {
>      RAMBlock *block = last_block;
>      ram_addr_t offset = last_offset;
> @@ -128,28 +507,27 @@ static int ram_save_block(QEMUFile *f)
>                                              current_addr + TARGET_PAGE_SIZE,
>                                              MIGRATION_DIRTY_FLAG);
> 
> -            p = block->host + offset;
> +            p = qemu_mallocz(TARGET_PAGE_SIZE);

Where is p freed when use_xbrle is off?

You should not introduce overhead in the case where use_xbrle is off.  Please
make sure the malloc/memcpy only happens if the page is added to the cache.

> +static int load_xbrle(QEMUFile *f, ram_addr_t addr, void *host)
> +{
> +    int ret, rc = -1;
> +    uint8_t *prev_page, *xor_buf, *xbrle_buf;
> +    XBRLEHeader hdr = {0};
> +
> +    /* extract RLE header */
> +    qemu_get_buffer(f, (uint8_t *) &hdr, sizeof(hdr));
> +    if (!(hdr.xh_flags & ENCODING_FLAG_XBRLE)) {
> +        fprintf(stderr, "Failed to load XBRLE page - wrong compression!\n");
> +        goto done;
> +    }
> +
> +    if (hdr.xh_len > TARGET_PAGE_SIZE) {
> +        fprintf(stderr, "Failed to load XBRLE page - len overflow!\n");
> +        goto done;
> +    }
> +
> +    /* load data and decode */
> +    xbrle_buf = (uint8_t *) qemu_mallocz(TARGET_PAGE_SIZE);
> +    qemu_get_buffer(f, xbrle_buf, hdr.xh_len);

Why allocate TARGET_PAGE_SIZE instead of hdr.xh_len and why zero it when
qemu_get_buffer() will overwrite it?

> +
> +    /* decode RLE */
> +    xor_buf = (uint8_t *) qemu_mallocz(TARGET_PAGE_SIZE);

Again there is no need to zero the buffer.

Stefan

next prev parent reply	other threads:[~2011-07-07  8:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-06 12:01 [Qemu-devel] [PATCH v2] XBRLE page delta compression for live migration of large memory apps Shribman, Aidan
2011-07-07  8:23 ` Stefan Hajnoczi [this message]
2011-08-02 13:45   ` Shribman, Aidan
2011-08-02 18:17     ` Stefan Hajnoczi
2011-07-07 21:07 ` Stefan Hajnoczi
2011-08-02 13:45   ` Shribman, Aidan
2011-08-02 15:53     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110707082324.GA14184@stefanha-thinkpad.localdomain \
    --to=stefanha@gmail.com \
    --cc=aidan.shribman@sap.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).