[Qemu-devel] [RFC] Split migration bitmaps by ramblock

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC] Split migration bitmaps by ramblock
@ 2017-03-23 21:01 Juan Quintela
  2017-03-23 21:01 ` [Qemu-devel] [PATCH] ram: Split dirty bitmap by RAMBlock Juan Quintela
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Juan Quintela @ 2017-03-23 21:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert

Hi

This series split the migration and unsent bitmaps by ramblock.  This
makes it easier to synchronize in small bits.  This is on top of the
RAMState and not-hotplug series.

Why?

reason 1:

People have complained that by the time that we detect that a page is
sent, it has already been marked dirty "again" inside kvm, so we are
going to send it again.  On top of this patch, my idea is, for words
of the bitmap that have any bit set, just synchonize the bitmap before
sending the pages.  I have not looking into performance numbers yet,
jsut asking for comments about how it is done.

reason 2:

In case where the host page is a multiple of the the TARGET_PAGE_SIZE,
we do a lot of work when we are synchronizing the bitmaps to pass it
to target page size.  The idea is to change the bitmaps on that
RAMBlocks to mean host page size and not TARGET_PAGE_SIZE.

Note that there are two reason for this, ARM and PPC do things like
guests with 4kb pages on hosts with 16/64kb hosts, and then we have
HugePages.  Note all the workarounds that postcopy has to do because
to work in HugePages size.

Please, comment?

Later, Juan.

Juan Quintela (1):
  ram: Split dirty bitmap by RAMBlock

 include/exec/ram_addr.h |  13 +++-
 migration/ram.c         | 201 ++++++++++++++++++------------------------------
 2 files changed, 85 insertions(+), 129 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH] ram: Split dirty bitmap by RAMBlock
  2017-03-23 21:01 [Qemu-devel] [RFC] Split migration bitmaps by ramblock Juan Quintela
@ 2017-03-23 21:01 ` Juan Quintela
  2017-03-24  1:57 ` [Qemu-devel] [RFC] Split migration bitmaps by ramblock Yang Hongyang
  2017-03-28 19:05 ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 9+ messages in thread
From: Juan Quintela @ 2017-03-23 21:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert

Both the ram bitmap and the unsent bitmap are split by RAMBlock.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/exec/ram_addr.h |  13 +++-
 migration/ram.c         | 201 ++++++++++++++++++------------------------------
 2 files changed, 85 insertions(+), 129 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index c246b55..1ffdfee 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -39,6 +39,14 @@ struct RAMBlock {
     QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
     int fd;
     size_t page_size;
+    /* dirty bitmap used during migration */
+    unsigned long *bmap;
+    /* bitmap of pages that haven't been sent even once
+     * only maintained and used in postcopy at the moment
+     * where it's used to send the dirtymap at the start
+     * of the postcopy phase
+     */
+    unsigned long *unsentmap;
 };
 
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
@@ -353,16 +361,15 @@ static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start,
 
 
 static inline
-uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
-                                               RAMBlock *rb,
+uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
                                                ram_addr_t start,
                                                ram_addr_t length,
                                                int64_t *real_dirty_pages)
 {
     ram_addr_t addr;
-    start = rb->offset + start;
     unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
     uint64_t num_dirty = 0;
+    unsigned long *dest = rb->bmap;
 
     /* start address is aligned at the start of a word? */
     if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) {
diff --git a/migration/ram.c b/migration/ram.c
index 045b899..30f241b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -138,19 +138,6 @@ out:
     return ret;
 }
 
-struct RAMBitmap {
-    struct rcu_head rcu;
-    /* Main migration bitmap */
-    unsigned long *bmap;
-    /* bitmap of pages that haven't been sent even once
-     * only maintained and used in postcopy at the moment
-     * where it's used to send the dirtymap at the start
-     * of the postcopy phase
-     */
-    unsigned long *unsentmap;
-};
-typedef struct RAMBitmap RAMBitmap;
-
 /*
  * An outstanding page request, on the source, having been received
  * and queued
@@ -222,8 +209,6 @@ struct RAMState {
     bool preffer_xbzrle;
     /* protects modification of the bitmap */
     QemuMutex bitmap_mutex;
-    /* Ram Bitmap protected by RCU */
-    RAMBitmap *ram_bitmap;
     /* The RAMBlock used in the last src_page_request */
     RAMBlock *last_req_rb;
     /* Queue of outstanding page requests from the destination */
@@ -616,22 +601,17 @@ static inline
 unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
                                           unsigned long start)
 {
-    unsigned long base = rb->offset >> TARGET_PAGE_BITS;
-    unsigned long nr = base + start;
-    uint64_t rb_size = rb->used_length;
-    unsigned long size = base + (rb_size >> TARGET_PAGE_BITS);
-    unsigned long *bitmap;
-
+    unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
+    unsigned long *bitmap = rb->bmap;
     unsigned long next;
 
-    bitmap = atomic_rcu_read(&rs->ram_bitmap)->bmap;
-    if (rs->ram_bulk_stage && nr > base) {
-        next = nr + 1;
+    if (rs->ram_bulk_stage && start > 0) {
+        next = start + 1;
     } else {
-        next = find_next_bit(bitmap, size, nr);
+        next = find_next_bit(bitmap, size, start);
     }
 
-    return next - base;
+    return next;
 }
 
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
@@ -639,10 +619,8 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 unsigned long page)
 {
     bool ret;
-    unsigned long *bitmap = atomic_rcu_read(&rs->ram_bitmap)->bmap;
-    unsigned long nr = (rb->offset >> TARGET_PAGE_BITS) + page;
 
-    ret = test_and_clear_bit(nr, bitmap);
+    ret = test_and_clear_bit(page, rb->bmap);
 
     if (ret) {
         rs->migration_dirty_pages--;
@@ -653,10 +631,8 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 static void migration_bitmap_sync_range(RAMState *rs, RAMBlock *rb,
                                         ram_addr_t start, ram_addr_t length)
 {
-    unsigned long *bitmap;
-    bitmap = atomic_rcu_read(&rs->ram_bitmap)->bmap;
     rs->migration_dirty_pages +=
-        cpu_physical_memory_sync_dirty_bitmap(bitmap, rb, start, length,
+        cpu_physical_memory_sync_dirty_bitmap(rb, start, length,
                                               &rs->num_dirty_pages_period);
 }
 
@@ -1159,17 +1135,13 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
          * search already sent it.
          */
         if (block) {
-            unsigned long *bitmap;
             unsigned long page;
 
-            bitmap = atomic_rcu_read(&rs->ram_bitmap)->bmap;
-            page = (block->offset + offset) >> TARGET_PAGE_BITS;
-            dirty = test_bit(page, bitmap);
+            page = offset >> TARGET_PAGE_BITS;
+            dirty = test_bit(page, block->bmap);
             if (!dirty) {
                 trace_get_queued_page_not_dirty(block->idstr, (uint64_t)offset,
-                    page,
-                    test_bit(page,
-                             atomic_rcu_read(&rs->ram_bitmap)->unsentmap));
+                       page, test_bit(page, block->unsentmap));
             } else {
                 trace_get_queued_page(block->idstr, (uint64_t)offset, page);
             }
@@ -1305,9 +1277,6 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
 
     /* Check the pages is dirty and if it is send it */
     if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
-        unsigned long *unsentmap;
-        unsigned long page =
-            (pss->block->offset >> TARGET_PAGE_BITS) + pss->page;
         if (!rs->preffer_xbzrle && migrate_use_compression()) {
             res = ram_save_compressed_page(rs, pss, last_stage);
         } else {
@@ -1317,9 +1286,8 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
         if (res < 0) {
             return res;
         }
-        unsentmap = atomic_rcu_read(&rs->ram_bitmap)->unsentmap;
-        if (unsentmap) {
-            clear_bit(page, unsentmap);
+        if (pss->block->unsentmap) {
+            clear_bit(pss->page, pss->block->unsentmap);
         }
     }
 
@@ -1449,25 +1417,20 @@ void free_xbzrle_decoded_buf(void)
     xbzrle_decoded_buf = NULL;
 }
 
-static void migration_bitmap_free(RAMBitmap *bmap)
-{
-    g_free(bmap->bmap);
-    g_free(bmap->unsentmap);
-    g_free(bmap);
-}
-
 static void ram_migration_cleanup(void *opaque)
 {
-    RAMState *rs = opaque;
+    RAMBlock *block;
 
     /* caller have hold iothread lock or is in a bh, so there is
      * no writing race against this migration_bitmap
      */
-    RAMBitmap *bitmap = rs->ram_bitmap;
-    atomic_rcu_set(&rs->ram_bitmap, NULL);
-    if (bitmap) {
-        memory_global_dirty_log_stop();
-        call_rcu(bitmap, migration_bitmap_free, rcu);
+    memory_global_dirty_log_stop();
+
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        g_free(block->bmap);
+        block->bmap = NULL;
+        g_free(block->unsentmap);
+        block->unsentmap = NULL;
     }
 
     XBZRLE_cache_lock();
@@ -1502,15 +1465,10 @@ static void ram_state_reset(RAMState *rs)
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
 {
     unsigned long ram_pages = last_ram_page();
-    RAMState *rs = &ram_state;
     int64_t cur;
     int64_t linelen = 128;
     char linebuf[129];
 
-    if (!todump) {
-        todump = atomic_rcu_read(&rs->ram_bitmap)->bmap;
-    }
-
     for (cur = 0; cur < ram_pages; cur += linelen) {
         int64_t curb;
         bool found = false;
@@ -1537,14 +1495,12 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
 
 void ram_postcopy_migrated_memory_release(MigrationState *ms)
 {
-    RAMState *rs = &ram_state;
     struct RAMBlock *block;
-    unsigned long *bitmap = atomic_rcu_read(&rs->ram_bitmap)->bmap;
 
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-        unsigned long first = block->offset >> TARGET_PAGE_BITS;
-        unsigned long range = first + (block->used_length >> TARGET_PAGE_BITS);
-        unsigned long run_start = find_next_zero_bit(bitmap, range, first);
+        unsigned long *bitmap = block->bmap;
+        unsigned long range = block->used_length >> TARGET_PAGE_BITS;
+        unsigned long run_start = find_next_zero_bit(bitmap, range, 0);
 
         while (run_start < range) {
             unsigned long run_end = find_next_bit(bitmap, range, run_start + 1);
@@ -1571,16 +1527,13 @@ void ram_postcopy_migrated_memory_release(MigrationState *ms)
  */
 static int postcopy_send_discard_bm_ram(MigrationState *ms,
                                         PostcopyDiscardState *pds,
-                                        unsigned long start,
-                                        unsigned long length)
+                                        RAMBlock *block)
 {
-    RAMState *rs = &ram_state;
-    unsigned long end = start + length; /* one after the end */
+    unsigned long end = block->used_length >> TARGET_PAGE_BITS;
     unsigned long current;
-    unsigned long *unsentmap;
+    unsigned long *unsentmap = block->unsentmap;
 
-    unsentmap = atomic_rcu_read(&rs->ram_bitmap)->unsentmap;
-    for (current = start; current < end; ) {
+    for (current = 0; current < end; ) {
         unsigned long one = find_next_bit(unsentmap, end, current);
 
         if (one <= end) {
@@ -1633,8 +1586,7 @@ static int postcopy_each_ram_send_discard(MigrationState *ms)
          * just needs indexes at this point, avoids it having
          * target page specific code.
          */
-        ret = postcopy_send_discard_bm_ram(ms, pds, first,
-                                    block->used_length >> TARGET_PAGE_BITS);
+        ret = postcopy_send_discard_bm_ram(ms, pds, block);
         postcopy_discard_send_finish(ms, pds);
         if (ret) {
             return ret;
@@ -1665,12 +1617,10 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
                                           PostcopyDiscardState *pds)
 {
     RAMState *rs = &ram_state;
-    unsigned long *bitmap;
-    unsigned long *unsentmap;
+    unsigned long *bitmap = block->bmap;
+    unsigned long *unsentmap = block->unsentmap;
     unsigned int host_ratio = block->page_size / TARGET_PAGE_SIZE;
-    unsigned long first = block->offset >> TARGET_PAGE_BITS;
-    unsigned long len = block->used_length >> TARGET_PAGE_BITS;
-    unsigned long last = first + (len - 1);
+    unsigned long pages = block->used_length >> TARGET_PAGE_BITS;
     unsigned long run_start;
 
     if (block->page_size == TARGET_PAGE_SIZE) {
@@ -1678,18 +1628,15 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
         return;
     }
 
-    bitmap = atomic_rcu_read(&rs->ram_bitmap)->bmap;
-    unsentmap = atomic_rcu_read(&rs->ram_bitmap)->unsentmap;
-
     if (unsent_pass) {
         /* Find a sent page */
-        run_start = find_next_zero_bit(unsentmap, last + 1, first);
+        run_start = find_next_zero_bit(unsentmap, pages, 0);
     } else {
         /* Find a dirty page */
-        run_start = find_next_bit(bitmap, last + 1, first);
+        run_start = find_next_bit(bitmap, pages, 0);
     }
 
-    while (run_start <= last) {
+    while (run_start < pages) {
         bool do_fixup = false;
         unsigned long fixup_start_addr;
         unsigned long host_offset;
@@ -1709,9 +1656,9 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
             /* Find the end of this run */
             unsigned long run_end;
             if (unsent_pass) {
-                run_end = find_next_bit(unsentmap, last + 1, run_start + 1);
+                run_end = find_next_bit(unsentmap, pages, run_start + 1);
             } else {
-                run_end = find_next_zero_bit(bitmap, last + 1, run_start + 1);
+                run_end = find_next_zero_bit(bitmap, pages, run_start + 1);
             }
             /*
              * If the end isn't at the start of a host page, then the
@@ -1768,11 +1715,10 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
 
         if (unsent_pass) {
             /* Find the next sent page for the next iteration */
-            run_start = find_next_zero_bit(unsentmap, last + 1,
-                                           run_start);
+            run_start = find_next_zero_bit(unsentmap, pages, run_start);
         } else {
             /* Find the next dirty page for the next iteration */
-            run_start = find_next_bit(bitmap, last + 1, run_start);
+            run_start = find_next_bit(bitmap, pages, run_start);
         }
     }
 }
@@ -1838,39 +1784,41 @@ static int postcopy_chunk_hostpages(MigrationState *ms)
 int ram_postcopy_send_discard_bitmap(MigrationState *ms)
 {
     RAMState *rs = &ram_state;
+    RAMBlock *block;
     int ret;
-    unsigned long *bitmap, *unsentmap;
 
     rcu_read_lock();
 
     /* This should be our last sync, the src is now paused */
     migration_bitmap_sync(rs);
 
-    unsentmap = atomic_rcu_read(&rs->ram_bitmap)->unsentmap;
-    if (!unsentmap) {
-        /* We don't have a safe way to resize the sentmap, so
-         * if the bitmap was resized it will be NULL at this
-         * point.
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        unsigned long pages = block->used_length >> TARGET_PAGE_BITS;
+        unsigned long *bitmap = block->bmap;
+        unsigned long *unsentmap = block->unsentmap;
+
+        if (!unsentmap) {
+            /* We don't have a safe way to resize the sentmap, so
+             * if the bitmap was resized it will be NULL at this
+             * point.
+             */
+            error_report("migration ram resized during precopy phase");
+            rcu_read_unlock();
+            return -EINVAL;
+        }
+
+        /* Deal with TPS != HPS and huge pages */
+        ret = postcopy_chunk_hostpages(ms);
+        if (ret) {
+            rcu_read_unlock();
+            return ret;
+        }
+
+        /*
+         * Update the unsentmap to be unsentmap = unsentmap | dirty
          */
-        error_report("migration ram resized during precopy phase");
-        rcu_read_unlock();
-        return -EINVAL;
+        bitmap_or(unsentmap, unsentmap, bitmap, pages);
     }
-
-    /* Deal with TPS != HPS and huge pages */
-    ret = postcopy_chunk_hostpages(ms);
-    if (ret) {
-        rcu_read_unlock();
-        return ret;
-    }
-
-    /*
-     * Update the unsentmap to be unsentmap = unsentmap | dirty
-     */
-    bitmap = atomic_rcu_read(&rs->ram_bitmap)->bmap;
-    bitmap_or(unsentmap, unsentmap, bitmap, last_ram_page());
-
-
     trace_ram_postcopy_send_discard_bitmap();
 #ifdef DEBUG_POSTCOPY
     ram_debug_dump_bitmap(unsentmap, true);
@@ -1916,8 +1864,6 @@ err:
 
 static int ram_state_init(RAMState *rs)
 {
-    unsigned long ram_bitmap_pages;
-
     memset(rs, 0, sizeof(*rs));
     qemu_mutex_init(&rs->bitmap_mutex);
     qemu_mutex_init(&rs->src_page_req_mutex);
@@ -1959,16 +1905,19 @@ static int ram_state_init(RAMState *rs)
     rcu_read_lock();
     ram_state_reset(rs);
 
-    rs->ram_bitmap = g_new0(RAMBitmap, 1);
     /* Skip setting bitmap if there is no RAM */
     if (ram_bytes_total()) {
-        ram_bitmap_pages = last_ram_page();
-        rs->ram_bitmap->bmap = bitmap_new(ram_bitmap_pages);
-        bitmap_set(rs->ram_bitmap->bmap, 0, ram_bitmap_pages);
+        RAMBlock *block;
 
-        if (migrate_postcopy_ram()) {
-            rs->ram_bitmap->unsentmap = bitmap_new(ram_bitmap_pages);
-            bitmap_set(rs->ram_bitmap->unsentmap, 0, ram_bitmap_pages);
+        QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+            unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
+
+            block->bmap = bitmap_new(pages);
+            bitmap_set(block->bmap, 0, pages);
+            if (migrate_postcopy_ram()) {
+                block->unsentmap = bitmap_new(pages);
+                bitmap_set(block->unsentmap, 0, pages);
+            }
         }
     }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC] Split migration bitmaps by ramblock
  2017-03-23 21:01 [Qemu-devel] [RFC] Split migration bitmaps by ramblock Juan Quintela
  2017-03-23 21:01 ` [Qemu-devel] [PATCH] ram: Split dirty bitmap by RAMBlock Juan Quintela
@ 2017-03-24  1:57 ` Yang Hongyang
  2017-03-24  8:34   ` Juan Quintela
  2017-03-28 19:05 ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 9+ messages in thread
From: Yang Hongyang @ 2017-03-24  1:57 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel; +Cc: dgilbert

Hi Juan,

  First of all, I like the refactor patchset about RAMState, it makes
things clean, great!

On 2017/3/24 5:01, Juan Quintela wrote:
> Hi
> 
> This series split the migration and unsent bitmaps by ramblock.  This
> makes it easier to synchronize in small bits.  This is on top of the
> RAMState and not-hotplug series.
> 
> Why?
> 
> reason 1:
> 
> People have complained that by the time that we detect that a page is
> sent, it has already been marked dirty "again" inside kvm, so we are
> going to send it again.  On top of this patch, my idea is, for words
> of the bitmap that have any bit set, just synchonize the bitmap before
> sending the pages.  I have not looking into performance numbers yet,
> jsut asking for comments about how it is done.

Here you said 'synchonize the bitmap before sending the pages', do you
mean synchronize the bitmap from kvm? If so, I doubt the performance...
because every synchronization will require a ioctl(). If not, the
synchronization of per block is useless.

Currently, migration thread will synchronize the bitmap from kvm every
iter(migration_bitmap_sync()). The RAMBlock already has kind of per block
bitmap for this kind of sync. And the migration bitmap is used to put all
those per block bitmap together for data sending use.

> 
> reason 2:
> 
> In case where the host page is a multiple of the the TARGET_PAGE_SIZE,
> we do a lot of work when we are synchronizing the bitmaps to pass it
> to target page size.  The idea is to change the bitmaps on that
> RAMBlocks to mean host page size and not TARGET_PAGE_SIZE.
> 
> Note that there are two reason for this, ARM and PPC do things like
> guests with 4kb pages on hosts with 16/64kb hosts, and then we have
> HugePages.  Note all the workarounds that postcopy has to do because
> to work in HugePages size.
> 
> Please, comment?
> 
> Later, Juan.
> 
> Juan Quintela (1):
>   ram: Split dirty bitmap by RAMBlock
> 
>  include/exec/ram_addr.h |  13 +++-
>  migration/ram.c         | 201 ++++++++++++++++++------------------------------
>  2 files changed, 85 insertions(+), 129 deletions(-)
> 

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC] Split migration bitmaps by ramblock
  2017-03-24  1:57 ` [Qemu-devel] [RFC] Split migration bitmaps by ramblock Yang Hongyang
@ 2017-03-24  8:34   ` Juan Quintela
  2017-03-24  9:23     ` Yang Hongyang
  0 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2017-03-24  8:34 UTC (permalink / raw)
  To: Yang Hongyang; +Cc: qemu-devel, dgilbert

Yang Hongyang <yanghongyang@huawei.com> wrote:
> Hi Juan,
>
>   First of all, I like the refactor patchset about RAMState, it makes
> things clean, great!

Thanks.

The whole idea of the series was to make testing changes easier.

> On 2017/3/24 5:01, Juan Quintela wrote:
>> Hi
>> 
>> This series split the migration and unsent bitmaps by ramblock.  This
>> makes it easier to synchronize in small bits.  This is on top of the
>> RAMState and not-hotplug series.
>> 
>> Why?
>> 
>> reason 1:
>> 
>> People have complained that by the time that we detect that a page is
>> sent, it has already been marked dirty "again" inside kvm, so we are
>> going to send it again.  On top of this patch, my idea is, for words
>> of the bitmap that have any bit set, just synchonize the bitmap before
>> sending the pages.  I have not looking into performance numbers yet,
>> jsut asking for comments about how it is done.
>
> Here you said 'synchonize the bitmap before sending the pages', do you
> mean synchronize the bitmap from kvm? If so, I doubt the performance...
> because every synchronization will require a ioctl(). If not, the
> synchronization of per block is useless.
>
> Currently, migration thread will synchronize the bitmap from kvm every
> iter(migration_bitmap_sync()). The RAMBlock already has kind of per block
> bitmap for this kind of sync. And the migration bitmap is used to put all
> those per block bitmap together for data sending use.

Hi
For huge memory machines, we are doing it always in one go.


bitmap_sync(1TB RAM)
walk bitmap for 512MB of RAM, at that point, it is very probable that
this page is again dirty in the KVM bitmap, so, we send it, but as it is
dirty again, we would have to send it in the next pass.  This sent is
completely useless.

So my idea is to split things in smaller chunks.  As we have to do an
ioctl, we wouldn't want to synchronize page by page, but perhaps 16MB at
a time, 64MB, anything less than the whole amount of memory.

Later, Juan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC] Split migration bitmaps by ramblock
  2017-03-24  8:34   ` Juan Quintela
@ 2017-03-24  9:23     ` Yang Hongyang
  2017-03-28 18:52       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 9+ messages in thread
From: Yang Hongyang @ 2017-03-24  9:23 UTC (permalink / raw)
  To: quintela; +Cc: qemu-devel, dgilbert



On 2017/3/24 16:34, Juan Quintela wrote:
> Yang Hongyang <yanghongyang@huawei.com> wrote:
>> Hi Juan,
>>
>>   First of all, I like the refactor patchset about RAMState, it makes
>> things clean, great!
> 
> Thanks.
> 
> The whole idea of the series was to make testing changes easier.
> 
>> On 2017/3/24 5:01, Juan Quintela wrote:
>>> Hi
>>>
>>> This series split the migration and unsent bitmaps by ramblock.  This
>>> makes it easier to synchronize in small bits.  This is on top of the
>>> RAMState and not-hotplug series.
>>>
>>> Why?
>>>
>>> reason 1:
>>>
>>> People have complained that by the time that we detect that a page is
>>> sent, it has already been marked dirty "again" inside kvm, so we are
>>> going to send it again.  On top of this patch, my idea is, for words
>>> of the bitmap that have any bit set, just synchonize the bitmap before
>>> sending the pages.  I have not looking into performance numbers yet,
>>> jsut asking for comments about how it is done.
>>
>> Here you said 'synchonize the bitmap before sending the pages', do you
>> mean synchronize the bitmap from kvm? If so, I doubt the performance...
>> because every synchronization will require a ioctl(). If not, the
>> synchronization of per block is useless.
>>
>> Currently, migration thread will synchronize the bitmap from kvm every
>> iter(migration_bitmap_sync()). The RAMBlock already has kind of per block
>> bitmap for this kind of sync. And the migration bitmap is used to put all
>> those per block bitmap together for data sending use.
> 
> Hi
> For huge memory machines, we are doing it always in one go.
> 
> 
> bitmap_sync(1TB RAM)
> walk bitmap for 512MB of RAM, at that point, it is very probable that
> this page is again dirty in the KVM bitmap, so, we send it, but as it is
> dirty again, we would have to send it in the next pass.  This sent is
> completely useless.

I got your point, the problem is KVM do not have the ability to sync in
small chunks currently, even it has, it will generate lots of ioctls:
KVM bitmap sync is per Memory Region IIRC, think we have a 1T mem
guest, 16 MRs for example, currently, every iter we need to do 16 ioctls.
But if we sync in small chunks, 64M for example, we might need to do
16384 ioctls in the worst case. eg:mem press(dirty rate) is very high.


> 
> So my idea is to split things in smaller chunks.  As we have to do an
> ioctl, we wouldn't want to synchronize page by page, but perhaps 16MB at
> a time, 64MB, anything less than the whole amount of memory.
> 
> Later, Juan.
> 

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC] Split migration bitmaps by ramblock
  2017-03-24  9:23     ` Yang Hongyang
@ 2017-03-28 18:52       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2017-03-28 18:52 UTC (permalink / raw)
  To: Yang Hongyang; +Cc: quintela, qemu-devel

* Yang Hongyang (yanghongyang@huawei.com) wrote:
> 
> 
> On 2017/3/24 16:34, Juan Quintela wrote:
> > Yang Hongyang <yanghongyang@huawei.com> wrote:
> >> Hi Juan,
> >>
> >>   First of all, I like the refactor patchset about RAMState, it makes
> >> things clean, great!
> > 
> > Thanks.
> > 
> > The whole idea of the series was to make testing changes easier.
> > 
> >> On 2017/3/24 5:01, Juan Quintela wrote:
> >>> Hi
> >>>
> >>> This series split the migration and unsent bitmaps by ramblock.  This
> >>> makes it easier to synchronize in small bits.  This is on top of the
> >>> RAMState and not-hotplug series.
> >>>
> >>> Why?
> >>>
> >>> reason 1:
> >>>
> >>> People have complained that by the time that we detect that a page is
> >>> sent, it has already been marked dirty "again" inside kvm, so we are
> >>> going to send it again.  On top of this patch, my idea is, for words
> >>> of the bitmap that have any bit set, just synchonize the bitmap before
> >>> sending the pages.  I have not looking into performance numbers yet,
> >>> jsut asking for comments about how it is done.
> >>
> >> Here you said 'synchonize the bitmap before sending the pages', do you
> >> mean synchronize the bitmap from kvm? If so, I doubt the performance...
> >> because every synchronization will require a ioctl(). If not, the
> >> synchronization of per block is useless.
> >>
> >> Currently, migration thread will synchronize the bitmap from kvm every
> >> iter(migration_bitmap_sync()). The RAMBlock already has kind of per block
> >> bitmap for this kind of sync. And the migration bitmap is used to put all
> >> those per block bitmap together for data sending use.
> > 
> > Hi
> > For huge memory machines, we are doing it always in one go.
> > 
> > 
> > bitmap_sync(1TB RAM)
> > walk bitmap for 512MB of RAM, at that point, it is very probable that
> > this page is again dirty in the KVM bitmap, so, we send it, but as it is
> > dirty again, we would have to send it in the next pass.  This sent is
> > completely useless.
> 
> I got your point, the problem is KVM do not have the ability to sync in
> small chunks currently, even it has, it will generate lots of ioctls:
> KVM bitmap sync is per Memory Region IIRC, think we have a 1T mem
> guest, 16 MRs for example, currently, every iter we need to do 16 ioctls.
> But if we sync in small chunks, 64M for example, we might need to do
> 16384 ioctls in the worst case. eg:mem press(dirty rate) is very high.

Yes but I think people  have suggested it should have that ability for a long
time; so hopefully we can add it.

I'd assumed the sizes of these chunks would be a bit bigger, a few GB each maybe
and also assumed we'd have a separate thread that was doing the syncing from
a different thread to the one doing the writing, trying to keep ahead of the
write pointer.

Even with the structure Juan has here, we could get to syncing each NUMA
node separately like that which would kind of makes some sense.

Dave

> 
> > 
> > So my idea is to split things in smaller chunks.  As we have to do an
> > ioctl, we wouldn't want to synchronize page by page, but perhaps 16MB at
> > a time, 64MB, anything less than the whole amount of memory.
> > 
> > Later, Juan.
> > 
> 
> -- 
> Thanks,
> Yang
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC] Split migration bitmaps by ramblock
  2017-03-23 21:01 [Qemu-devel] [RFC] Split migration bitmaps by ramblock Juan Quintela
  2017-03-23 21:01 ` [Qemu-devel] [PATCH] ram: Split dirty bitmap by RAMBlock Juan Quintela
  2017-03-24  1:57 ` [Qemu-devel] [RFC] Split migration bitmaps by ramblock Yang Hongyang
@ 2017-03-28 19:05 ` Dr. David Alan Gilbert
  2017-03-29  8:51   ` Juan Quintela
  2 siblings, 1 reply; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2017-03-28 19:05 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

* Juan Quintela (quintela@redhat.com) wrote:
> Hi
> 
> This series split the migration and unsent bitmaps by ramblock.  This
> makes it easier to synchronize in small bits.  This is on top of the
> RAMState and not-hotplug series.

So I think generally this is a good idea; my main reason is I'd like
to see per-NUMA node syncing, preferably tied with multi-fd so that
each of the NUMA nodes syncs and stuffs data over it's own NIC.
Although we would still have to be careful about cases where one
node is hammering it's RAM and the others are idleing.

> Why?
> 
> reason 1:
> 
> People have complained that by the time that we detect that a page is
> sent, it has already been marked dirty "again" inside kvm, so we are
> going to send it again.  On top of this patch, my idea is, for words
> of the bitmap that have any bit set, just synchonize the bitmap before
> sending the pages.  I have not looking into performance numbers yet,
> jsut asking for comments about how it is done.
> 
> reason 2:
> 
> In case where the host page is a multiple of the the TARGET_PAGE_SIZE,
> we do a lot of work when we are synchronizing the bitmaps to pass it
> to target page size.  The idea is to change the bitmaps on that
> RAMBlocks to mean host page size and not TARGET_PAGE_SIZE.
> 
> Note that there are two reason for this, ARM and PPC do things like
> guests with 4kb pages on hosts with 16/64kb hosts, and then we have
> HugePages.  Note all the workarounds that postcopy has to do because
> to work in HugePages size.

There are some fun problems with changing the bitmap page size;
off the top of my head, the ones I can remember include:
    a) I'm sure I've seen rare cases where a target page is marked as
       dirty inside a hostpage; I'm guessing that was qemu's doing, but
       there are more subtle cases, e.g. running a 4kb guest on a 64kb host;
       it's legal - and 4kb power guests used to exist;  I think in those
       cases you see KVM only marking one target page as dirty.

    b) Are we required to support migration across hosts of different pagesize;
       and if we do that what size should a bit represent?
       People asked about it during postcopy but I think it's restricted to
       matching sizes.  I don't think precopy has any requirement for matching
       host pagesize at the moment.  64bit ARM does 4k, 64k and I think 16k was
       added later.

    c) Hugepages have similar issues; precopy doesn't currently have any
       requirement for the hugepage selection on the two hosts to match,
       but it does on postcopy.  Also you don't want to have a single dirty
       bit for a 1GB host hugepage if you can handle detecting changes at
       a finer grain level.

Dave
> Please, comment?
> 
> Later, Juan.
> 
> Juan Quintela (1):
>   ram: Split dirty bitmap by RAMBlock
> 
>  include/exec/ram_addr.h |  13 +++-
>  migration/ram.c         | 201 ++++++++++++++++++------------------------------
>  2 files changed, 85 insertions(+), 129 deletions(-)
> 
> -- 
> 2.9.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC] Split migration bitmaps by ramblock
  2017-03-28 19:05 ` Dr. David Alan Gilbert
@ 2017-03-29  8:51   ` Juan Quintela
  2017-03-31 17:50     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2017-03-29  8:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> Note that there are two reason for this, ARM and PPC do things like
>> guests with 4kb pages on hosts with 16/64kb hosts, and then we have
>> HugePages.  Note all the workarounds that postcopy has to do because
>> to work in HugePages size.
>
> There are some fun problems with changing the bitmap page size;
> off the top of my head, the ones I can remember include:
>     a) I'm sure I've seen rare cases where a target page is marked as
>        dirty inside a hostpage; I'm guessing that was qemu's doing, but
>        there are more subtle cases, e.g. running a 4kb guest on a 64kb host;
>        it's legal - and 4kb power guests used to exist;  I think in those
>        cases you see KVM only marking one target page as dirty.

        /*
         * bitmap-traveling is faster than memory-traveling (for addr...)
         * especially when most of the memory is not dirty.
         */
        for (i = 0; i < len; i++) {
            if (bitmap[i] != 0) {
                c = leul_to_cpu(bitmap[i]);
                do {
                    j = ctzl(c);
                    c &= ~(1ul << j);
                    page_number = (i * HOST_LONG_BITS + j) * hpratio;
                    addr = page_number * TARGET_PAGE_SIZE;
                    ram_addr = start + addr;
                    cpu_physical_memory_set_dirty_range(ram_addr,
                                       TARGET_PAGE_SIZE * hpratio, clients);
                } while (c != 0);
            }
        }


Thisis the code that we end using when we are synchronizing from kvm, so
if we don't have all pages of a host page set to one (or zero)  I think
we are doing something wrong, no?  Or I am missunderstanding the code?


>     b) Are we required to support migration across hosts of different pagesize;
>        and if we do that what size should a bit represent?
>        People asked about it during postcopy but I think it's restricted to
>        matching sizes.  I don't think precopy has any requirement for matching
>        host pagesize at the moment.  64bit ARM does 4k, 64k and I think 16k was
>        added later.

With current precopy, we should work independently of the host page size
(famous last words), and in a 1st step, I will only send pages of the
size TARGET_PAGE_SIZE.  I will only change the bitmaps.  We can add
bigger pages later.

>     c) Hugepages have similar issues; precopy doesn't currently have any
>        requirement for the hugepage selection on the two hosts to match,
>        but it does on postcopy.  Also you don't want to have a single dirty
>        bit for a 1GB host hugepage if you can handle detecting changes at
>        a finer grain level.

I agree here, I was thinking more on the Power/ARM case than the
HugePage case.  For the 2MB page, we could think about doing it, for the
1GB case, it is not gonna work.

Later, Juan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC] Split migration bitmaps by ramblock
  2017-03-29  8:51   ` Juan Quintela
@ 2017-03-31 17:50     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2017-03-31 17:50 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:
> >> Note that there are two reason for this, ARM and PPC do things like
> >> guests with 4kb pages on hosts with 16/64kb hosts, and then we have
> >> HugePages.  Note all the workarounds that postcopy has to do because
> >> to work in HugePages size.
> >
> > There are some fun problems with changing the bitmap page size;
> > off the top of my head, the ones I can remember include:
> >     a) I'm sure I've seen rare cases where a target page is marked as
> >        dirty inside a hostpage; I'm guessing that was qemu's doing, but
> >        there are more subtle cases, e.g. running a 4kb guest on a 64kb host;
> >        it's legal - and 4kb power guests used to exist;  I think in those
> >        cases you see KVM only marking one target page as dirty.
> 
>         /*
>          * bitmap-traveling is faster than memory-traveling (for addr...)
>          * especially when most of the memory is not dirty.
>          */
>         for (i = 0; i < len; i++) {
>             if (bitmap[i] != 0) {
>                 c = leul_to_cpu(bitmap[i]);
>                 do {
>                     j = ctzl(c);
>                     c &= ~(1ul << j);
>                     page_number = (i * HOST_LONG_BITS + j) * hpratio;
>                     addr = page_number * TARGET_PAGE_SIZE;
>                     ram_addr = start + addr;
>                     cpu_physical_memory_set_dirty_range(ram_addr,
>                                        TARGET_PAGE_SIZE * hpratio, clients);
>                 } while (c != 0);
>             }
>         }
> 
> 
> Thisis the code that we end using when we are synchronizing from kvm, so
> if we don't have all pages of a host page set to one (or zero)  I think
> we are doing something wrong, no?  Or I am missunderstanding the code?

Hmm, that does look like that - so perhaps the case I was seeing was just
qemu setting it somewhere?
(I definitely remember seeing it because I remember dumping the bitmaps
and checking for them; but I can't remember the circumstance)

> >     b) Are we required to support migration across hosts of different pagesize;
> >        and if we do that what size should a bit represent?
> >        People asked about it during postcopy but I think it's restricted to
> >        matching sizes.  I don't think precopy has any requirement for matching
> >        host pagesize at the moment.  64bit ARM does 4k, 64k and I think 16k was
> >        added later.
> 
> With current precopy, we should work independently of the host page size
> (famous last words), and in a 1st step, I will only send pages of the
> size TARGET_PAGE_SIZE.  I will only change the bitmaps.  We can add
> bigger pages later.
> 
> >     c) Hugepages have similar issues; precopy doesn't currently have any
> >        requirement for the hugepage selection on the two hosts to match,
> >        but it does on postcopy.  Also you don't want to have a single dirty
> >        bit for a 1GB host hugepage if you can handle detecting changes at
> >        a finer grain level.
> 
> I agree here, I was thinking more on the Power/ARM case than the
> HugePage case.  For the 2MB page, we could think about doing it, for the
> 1GB case, it is not gonna work.

Yep,

Dave

> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-03-31 17:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-23 21:01 [Qemu-devel] [RFC] Split migration bitmaps by ramblock Juan Quintela
2017-03-23 21:01 ` [Qemu-devel] [PATCH] ram: Split dirty bitmap by RAMBlock Juan Quintela
2017-03-24  1:57 ` [Qemu-devel] [RFC] Split migration bitmaps by ramblock Yang Hongyang
2017-03-24  8:34   ` Juan Quintela
2017-03-24  9:23     ` Yang Hongyang
2017-03-28 18:52       ` Dr. David Alan Gilbert
2017-03-28 19:05 ` Dr. David Alan Gilbert
2017-03-29  8:51   ` Juan Quintela
2017-03-31 17:50     ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).