[PATCH v7 0/2] Optimized some codes and fixed PVM hang when enabling auto-converge

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v7 0/2] Optimized some codes and fixed PVM hang when enabling auto-converge
@ 2021-11-09  3:04 Rao, Lei
  2021-11-09  3:04 ` [PATCH v7 1/2] Reset the auto-converge counter at every checkpoint Rao, Lei
  2021-11-09  3:04 ` [PATCH v7 2/2] Reduce the PVM stop time during Checkpoint Rao, Lei
  0 siblings, 2 replies; 5+ messages in thread
From: Rao, Lei @ 2021-11-09  3:04 UTC (permalink / raw)
  To: chen.zhang, zhang.zhanghailiang, quintela, lukasstraub2, dgilbert
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

Changes since v1-v6:
    --Reset the state of the auto-converge counters at every checkpoint instead of directly disabling.
    --Remove cpu_throttle_stop from mig_throttle_counter_reset.

The series of patches include:
    Reduced the PVM stop time during checkpoint.
    Fixed the PVM hang when enabling auto-converge feature for COLO.

Rao, Lei (2):
  Reset the auto-converge counter at every checkpoint.
  Reduce the PVM stop time during Checkpoint

 migration/colo.c |  4 ++++
 migration/ram.c  | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 migration/ram.h  |  1 +
 3 files changed, 59 insertions(+), 3 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v7 1/2] Reset the auto-converge counter at every checkpoint.
  2021-11-09  3:04 [PATCH v7 0/2] Optimized some codes and fixed PVM hang when enabling auto-converge Rao, Lei
@ 2021-11-09  3:04 ` Rao, Lei
  2021-11-09  7:48   ` Juan Quintela
  2021-11-09  3:04 ` [PATCH v7 2/2] Reduce the PVM stop time during Checkpoint Rao, Lei
  1 sibling, 1 reply; 5+ messages in thread
From: Rao, Lei @ 2021-11-09  3:04 UTC (permalink / raw)
  To: chen.zhang, zhang.zhanghailiang, quintela, lukasstraub2, dgilbert
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

if we don't reset the auto-converge counter,
it will continue to run with COLO running,
and eventually the system will hang due to the
CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/colo.c | 4 ++++
 migration/ram.c  | 9 +++++++++
 migration/ram.h  | 1 +
 3 files changed, 14 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index e3b1f13..2415325 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -459,6 +459,10 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     if (ret < 0) {
         goto out;
     }
+
+    if (migrate_auto_converge()) {
+        mig_throttle_counter_reset();
+    }
     /*
      * Only save VM's live state, which not including device state.
      * TODO: We may need a timeout mechanism to prevent COLO process
diff --git a/migration/ram.c b/migration/ram.c
index 847af46..d5f98e6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -641,6 +641,15 @@ static void mig_throttle_guest_down(uint64_t bytes_dirty_period,
     }
 }
 
+void mig_throttle_counter_reset(void)
+{
+    RAMState *rs = ram_state;
+
+    rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    rs->num_dirty_pages_period = 0;
+    rs->bytes_xfer_prev = ram_counters.transferred;
+}
+
 /**
  * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache
  *
diff --git a/migration/ram.h b/migration/ram.h
index dda1988..c515396 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -50,6 +50,7 @@ bool ramblock_is_ignored(RAMBlock *block);
 int xbzrle_cache_resize(uint64_t new_size, Error **errp);
 uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_total(void);
+void mig_throttle_counter_reset(void);
 
 uint64_t ram_pagesize_summary(void);
 int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v7 1/2] Reset the auto-converge counter at every checkpoint.
  2021-11-09  3:04 ` [PATCH v7 1/2] Reset the auto-converge counter at every checkpoint Rao, Lei
@ 2021-11-09  7:48   ` Juan Quintela
  0 siblings, 0 replies; 5+ messages in thread
From: Juan Quintela @ 2021-11-09  7:48 UTC (permalink / raw)
  To: Rao, Lei
  Cc: qemu-devel, chen.zhang, lukasstraub2, zhang.zhanghailiang,
	dgilbert

"Rao, Lei" <lei.rao@intel.com> wrote:
> From: "Rao, Lei" <lei.rao@intel.com>
>
> if we don't reset the auto-converge counter,
> it will continue to run with COLO running,
> and eventually the system will hang due to the
> CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE.
>
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Lukas Straub <lukasstraub2@web.de>
> Tested-by: Lukas Straub <lukasstraub2@web.de>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v7 2/2] Reduce the PVM stop time during Checkpoint
  2021-11-09  3:04 [PATCH v7 0/2] Optimized some codes and fixed PVM hang when enabling auto-converge Rao, Lei
  2021-11-09  3:04 ` [PATCH v7 1/2] Reset the auto-converge counter at every checkpoint Rao, Lei
@ 2021-11-09  3:04 ` Rao, Lei
  2021-11-09  7:45   ` Juan Quintela
  1 sibling, 1 reply; 5+ messages in thread
From: Rao, Lei @ 2021-11-09  3:04 UTC (permalink / raw)
  To: chen.zhang, zhang.zhanghailiang, quintela, lukasstraub2, dgilbert
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

When flushing memory from ram cache to ram during every checkpoint
on secondary VM, we can copy continuous chunks of memory instead of
4096 bytes per time to reduce the time of VM stop during checkpoint.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/ram.c | 48 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d5f98e6..863035d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -845,6 +845,41 @@ migration_clear_memory_region_dirty_bitmap_range(RAMBlock *rb,
     }
 }
 
+/*
+ * colo_bitmap_find_diry:find contiguous dirty pages from start
+ *
+ * Returns the page offset within memory region of the start of the contiguout
+ * dirty page
+ *
+ * @rs: current RAM state
+ * @rb: RAMBlock where to search for dirty pages
+ * @start: page where we start the search
+ * @num: the number of contiguous dirty pages
+ */
+static inline
+unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
+                                     unsigned long start, unsigned long *num)
+{
+    unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
+    unsigned long *bitmap = rb->bmap;
+    unsigned long first, next;
+
+    *num = 0;
+
+    if (ramblock_is_ignored(rb)) {
+        return size;
+    }
+
+    first = find_next_bit(bitmap, size, start);
+    if (first >= size) {
+        return first;
+    }
+    next = find_next_zero_bit(bitmap, size, first + 1);
+    assert(next >= first);
+    *num = next - first;
+    return first;
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 RAMBlock *rb,
                                                 unsigned long page)
@@ -3895,19 +3930,26 @@ void colo_flush_ram_cache(void)
         block = QLIST_FIRST_RCU(&ram_list.blocks);
 
         while (block) {
-            offset = migration_bitmap_find_dirty(ram_state, block, offset);
+            unsigned long num = 0;
 
+            offset = colo_bitmap_find_dirty(ram_state, block, offset, &num);
             if (!offset_in_ramblock(block,
                                     ((ram_addr_t)offset) << TARGET_PAGE_BITS)) {
                 offset = 0;
+                num = 0;
                 block = QLIST_NEXT_RCU(block, next);
             } else {
-                migration_bitmap_clear_dirty(ram_state, block, offset);
+                unsigned long i = 0;
+
+                for (i = 0; i < num; i++) {
+                    migration_bitmap_clear_dirty(ram_state, block, offset + i);
+                }
                 dst_host = block->host
                          + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
                 src_host = block->colo_cache
                          + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
-                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE * num);
+                offset += num;
             }
         }
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v7 2/2] Reduce the PVM stop time during Checkpoint
  2021-11-09  3:04 ` [PATCH v7 2/2] Reduce the PVM stop time during Checkpoint Rao, Lei
@ 2021-11-09  7:45   ` Juan Quintela
  0 siblings, 0 replies; 5+ messages in thread
From: Juan Quintela @ 2021-11-09  7:45 UTC (permalink / raw)
  To: Rao, Lei
  Cc: qemu-devel, chen.zhang, lukasstraub2, zhang.zhanghailiang,
	dgilbert

"Rao, Lei" <lei.rao@intel.com> wrote:
> From: "Rao, Lei" <lei.rao@intel.com>
>
> When flushing memory from ram cache to ram during every checkpoint
> on secondary VM, we can copy continuous chunks of memory instead of
> 4096 bytes per time to reduce the time of VM stop during checkpoint.
>
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Lukas Straub <lukasstraub2@web.de>
> Tested-by: Lukas Straub <lukasstraub2@web.de>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Queued.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-09  7:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-09  3:04 [PATCH v7 0/2] Optimized some codes and fixed PVM hang when enabling auto-converge Rao, Lei
2021-11-09  3:04 ` [PATCH v7 1/2] Reset the auto-converge counter at every checkpoint Rao, Lei
2021-11-09  7:48   ` Juan Quintela
2021-11-09  3:04 ` [PATCH v7 2/2] Reduce the PVM stop time during Checkpoint Rao, Lei
2021-11-09  7:45   ` Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).