[Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition
@ 2012-10-18  7:26 Juan Quintela
  2012-10-18  7:26 ` [Qemu-devel] [PATCH 01/72] fix virtfs Juan Quintela
  0 siblings, 1 reply; 7+ messages in thread
From: Juan Quintela @ 2012-10-18  7:26 UTC (permalink / raw)
  To: qemu-devel

Hi

This series apply on top of the refactoring that I sent yesterday.
Changes from the last version include:

- buffered_file.c is gone, its functionality is merged in migration.c
  special attention to the megre of buffered_file_thread() &
  migration_file_put_notify().

- Some more bitmap handling optimizations (thanks to Orit & Paolo for
  suggestions and code and Vinod for testing)

Please review.  Included is the pointer to the full tree.

Thanks, Juan.

The following changes since commit b6348f29d033d5a8a26f633d2ee94362595f32a4:

  target-arm/translate: Fix RRX operands (2012-10-17 19:56:46 +0200)

are available in the git repository at:

  http://repo.or.cz/r/qemu/quintela.git migration-thread-20121017

for you to fetch changes up to 486dabc29f56d8f0e692395d4a6cd483b3a77f01:

  ram: optimize migration bitmap walking (2012-10-18 09:20:34 +0200)


v3:

This is work in progress on top of the previous migration series just sent.

- Introduces a thread for migration instead of using a timer and callback
- remove the writting to the fd from the iothread lock
- make the writes synchronous
- Introduce a new pending method that returns how many bytes are pending for
  one save live section
- last patch just shows printfs to see where the time is being spent
  on the migration complete phase.
  (yes it pollutes all uses of stop on the monitor)

So far I have found that we spent a lot of time on bdrv_flush_all() It
can take from 1ms to 600ms (yes, it is not a typo).  That dwarfs the
migration default downtime time (30ms).

Stop all vcpus:

- it works now (after the changes on qemu_cpu_is_vcpu on the previous
  series) caveat is that the time that brdv_flush_all() takes is
  "unpredictable".  Any silver bullets?

  Paolo suggested to call for migration completion phase:

  bdrv_aio_flush_all();
  Sent the dirty pages;
  bdrv_drain_all()
  brdv_flush_all()
  another round through the bitmap in case that completions have
  changed some page

  Paolo, did I get it right?
  Any other suggestion?

- migrate_cancel() is not properly implemented (as in the film that we
  take no locks, ...)

- expected_downtime is not calculated.

  I am about to merge migrate_fd_put_ready & buffered_thread() and
  that would make trivial to calculate.

It outputs something like:

wakeup_request 0
time cpu_disable_ticks 0
time pause_all_vcpus 1
time runstate_set 1
time vmstate_notify 2
time bdrv_drain_all 2
time flush device /dev/disk/by-path/ip-192.168.10.200:3260-iscsi-iqn.2010-12.org.trasno:iscsi.lvm-lun-1: 3
time flush device : 3
time flush device : 3
time flush device : 3
time bdrv_flush_all 5
time monitor_protocol_event 5
vm_stop 2 5
synchronize_all_states 1
migrate RAM 37
migrate rest devices 1
complete without error 3a 44
completed 45
end completed stage 45

As you can see, we estimate that we can sent all pending data in 30ms,
it took 37ms to send the RAM (that is what we calculate).  So
estimation is quite good.

What it gives me lots of variation is on the line with device name of "time flush device".
That is what varies between 1ms to 600ms

This is in a completely idle guest.  I am running:

	while (1) {
		uint64_t delay;

		if (gettimeofday(&t0, NULL) != 0)
			perror("gettimeofday 1");
		if (usleep(ms2us(10)) != 0)
			perror("usleep");
		if (gettimeofday(&t1, NULL) != 0)
			perror("gettimeofday 2");

		t1.tv_usec -= t0.tv_usec;
		if (t1.tv_usec < 0) {
			t1.tv_usec += 1000000;
			t1.tv_sec--;
		}
		t1.tv_sec -= t0.tv_sec;

		delay = t1.tv_sec * 1000 + t1.tv_usec/1000;

		if (delay > 100)
			printf("delay of %ld ms\n", delay);
       }

To see the latency inside the guest (i.e. ask for a 10ms sleep, and see how long it takes).


[root@d1 ~]# ./timer 
delay of 161 ms
delay of 135 ms
delay of 143 ms
delay of 132 ms
delay of 131 ms
delay of 141 ms
delay of 113 ms
delay of 119 ms
delay of 114 ms


But that values are independent of migration.  Without even starting
the migration, idle guest doing nothing, we get it sometimes.

Juan Quintela (27):
  buffered_file: Move from using a timer to use a thread
  migration: make qemu_fopen_ops_buffered() return void
  migration: stop all cpus correctly
  migration: make writes blocking
  migration: remove unfreeze logic
  migration: take finer locking
  buffered_file: Unfold the trick to restart generating migration data
  buffered_file: don't flush on put buffer
  buffered_file: unfold buffered_append in buffered_put_buffer
  savevm: New save live migration method: pending
  migration: include qemu-file.h
  migration-fd: remove duplicate include
  migration: move buffered_file.c code into migration.c
  migration: move migration_fd_put_ready()
  migration: Inline qemu_fopen_ops_buffered into migrate_fd_connect
  migration: move migration notifier
  migration: move begining stage to the migration thread
  migration: move exit condition to migration thread
  migration: unfold rest of migrate_fd_put_ready() into thread
  migration: print times for end phase
  ram: rename last_block to last_seen_block
  ram: Add last_sent_block
  memory: introduce memory_region_test_and_clear_dirty
  ram: Use memory_region_test_and_clear_dirty
  fix memory.c
  migration: Only go to the iterate stage if there is anything to send
  ram: optimize migration bitmap walking

Paolo Bonzini (1):
  split MRU ram list

Umesh Deshpande (2):
  add a version number to ram_list
  protect the ramlist with a separate mutex

 Makefile.objs     |   2 +-
 arch_init.c       | 133 +++++++++++--------
 block-migration.c |  49 ++-----
 block.c           |   6 +
 buffered_file.c   | 256 -----------------------------------
 buffered_file.h   |  22 ---
 cpu-all.h         |  13 +-
 cpus.c            |  17 +++
 exec.c            |  44 +++++-
 memory.c          |  17 +++
 memory.h          |  18 +++
 migration-exec.c  |   4 +-
 migration-fd.c    |   9 +-
 migration-tcp.c   |  21 +--
 migration-unix.c  |   4 +-
 migration.c       | 391 ++++++++++++++++++++++++++++++++++++++++--------------
 migration.h       |   4 +-
 qemu-file.h       |   5 -
 savevm.c          |  37 +++++-
 sysemu.h          |   1 +
 vmstate.h         |   1 +
 21 files changed, 522 insertions(+), 532 deletions(-)
 delete mode 100644 buffered_file.c
 delete mode 100644 buffered_file.h

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 01/72] fix virtfs
  2012-10-18  7:26 [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition Juan Quintela
@ 2012-10-18  7:26 ` Juan Quintela
  0 siblings, 0 replies; 7+ messages in thread
From: Juan Quintela @ 2012-10-18  7:26 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 fsdev/virtfs-proxy-helper.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fsdev/virtfs-proxy-helper.c b/fsdev/virtfs-proxy-helper.c
index f9a8270..771dc4e 100644
--- a/fsdev/virtfs-proxy-helper.c
+++ b/fsdev/virtfs-proxy-helper.c
@@ -282,6 +282,7 @@ static int send_status(int sockfd, struct iovec *iovec, int status)
  */
 static int setfsugid(int uid, int gid)
 {
+    int ret;
     /*
      * We still need DAC_OVERRIDE because  we don't change
      * supplementary group ids, and hence may be subjected DAC rules
@@ -290,8 +291,14 @@ static int setfsugid(int uid, int gid)
         CAP_DAC_OVERRIDE,
     };

-    setfsgid(gid);
-    setfsuid(uid);
+    ret = setfsgid(gid);
+    if (ret < 0) {
+        return ret;
+    }
+    ret = setfsuid(uid);
+    if (ret < 0) {
+        return ret;
+    }

     if (uid != 0 || gid != 0) {
         return do_cap_set(cap_list, ARRAY_SIZE(cap_list), 0);
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition
@ 2012-10-18  7:29 Juan Quintela
  2012-10-18  9:00 ` Paolo Bonzini
  2012-10-26 13:04 ` Paolo Bonzini
  0 siblings, 2 replies; 7+ messages in thread
From: Juan Quintela @ 2012-10-18  7:29 UTC (permalink / raw)
  To: qemu-devel

Hi

This series apply on top of the refactoring that I sent yesterday.
Changes from the last version include:

- buffered_file.c is gone, its functionality is merged in migration.c
  special attention to the megre of buffered_file_thread() &
  migration_file_put_notify().

- Some more bitmap handling optimizations (thanks to Orit & Paolo for
  suggestions and code and Vinod for testing)

Please review.  Included is the pointer to the full tree.

Thanks, Juan.

The following changes since commit b6348f29d033d5a8a26f633d2ee94362595f32a4:

  target-arm/translate: Fix RRX operands (2012-10-17 19:56:46 +0200)

are available in the git repository at:

  http://repo.or.cz/r/qemu/quintela.git migration-thread-20121017

for you to fetch changes up to 486dabc29f56d8f0e692395d4a6cd483b3a77f01:

  ram: optimize migration bitmap walking (2012-10-18 09:20:34 +0200)


v3:

This is work in progress on top of the previous migration series just sent.

- Introduces a thread for migration instead of using a timer and callback
- remove the writting to the fd from the iothread lock
- make the writes synchronous
- Introduce a new pending method that returns how many bytes are pending for
  one save live section
- last patch just shows printfs to see where the time is being spent
  on the migration complete phase.
  (yes it pollutes all uses of stop on the monitor)

So far I have found that we spent a lot of time on bdrv_flush_all() It
can take from 1ms to 600ms (yes, it is not a typo).  That dwarfs the
migration default downtime time (30ms).

Stop all vcpus:

- it works now (after the changes on qemu_cpu_is_vcpu on the previous
  series) caveat is that the time that brdv_flush_all() takes is
  "unpredictable".  Any silver bullets?

  Paolo suggested to call for migration completion phase:

  bdrv_aio_flush_all();
  Sent the dirty pages;
  bdrv_drain_all()
  brdv_flush_all()
  another round through the bitmap in case that completions have
  changed some page

  Paolo, did I get it right?
  Any other suggestion?

- migrate_cancel() is not properly implemented (as in the film that we
  take no locks, ...)

- expected_downtime is not calculated.

  I am about to merge migrate_fd_put_ready & buffered_thread() and
  that would make trivial to calculate.

It outputs something like:

wakeup_request 0
time cpu_disable_ticks 0
time pause_all_vcpus 1
time runstate_set 1
time vmstate_notify 2
time bdrv_drain_all 2
time flush device /dev/disk/by-path/ip-192.168.10.200:3260-iscsi-iqn.2010-12.org.trasno:iscsi.lvm-lun-1: 3
time flush device : 3
time flush device : 3
time flush device : 3
time bdrv_flush_all 5
time monitor_protocol_event 5
vm_stop 2 5
synchronize_all_states 1
migrate RAM 37
migrate rest devices 1
complete without error 3a 44
completed 45
end completed stage 45

As you can see, we estimate that we can sent all pending data in 30ms,
it took 37ms to send the RAM (that is what we calculate).  So
estimation is quite good.

What it gives me lots of variation is on the line with device name of "time flush device".
That is what varies between 1ms to 600ms

This is in a completely idle guest.  I am running:

	while (1) {
		uint64_t delay;

		if (gettimeofday(&t0, NULL) != 0)
			perror("gettimeofday 1");
		if (usleep(ms2us(10)) != 0)
			perror("usleep");
		if (gettimeofday(&t1, NULL) != 0)
			perror("gettimeofday 2");

		t1.tv_usec -= t0.tv_usec;
		if (t1.tv_usec < 0) {
			t1.tv_usec += 1000000;
			t1.tv_sec--;
		}
		t1.tv_sec -= t0.tv_sec;

		delay = t1.tv_sec * 1000 + t1.tv_usec/1000;

		if (delay > 100)
			printf("delay of %ld ms\n", delay);
       }

To see the latency inside the guest (i.e. ask for a 10ms sleep, and see how long it takes).


[root@d1 ~]# ./timer 
delay of 161 ms
delay of 135 ms
delay of 143 ms
delay of 132 ms
delay of 131 ms
delay of 141 ms
delay of 113 ms
delay of 119 ms
delay of 114 ms


But that values are independent of migration.  Without even starting
the migration, idle guest doing nothing, we get it sometimes.



Juan Quintela (27):
  buffered_file: Move from using a timer to use a thread
  migration: make qemu_fopen_ops_buffered() return void
  migration: stop all cpus correctly
  migration: make writes blocking
  migration: remove unfreeze logic
  migration: take finer locking
  buffered_file: Unfold the trick to restart generating migration data
  buffered_file: don't flush on put buffer
  buffered_file: unfold buffered_append in buffered_put_buffer
  savevm: New save live migration method: pending
  migration: include qemu-file.h
  migration-fd: remove duplicate include
  migration: move buffered_file.c code into migration.c
  migration: move migration_fd_put_ready()
  migration: Inline qemu_fopen_ops_buffered into migrate_fd_connect
  migration: move migration notifier
  migration: move begining stage to the migration thread
  migration: move exit condition to migration thread
  migration: unfold rest of migrate_fd_put_ready() into thread
  migration: print times for end phase
  ram: rename last_block to last_seen_block
  ram: Add last_sent_block
  memory: introduce memory_region_test_and_clear_dirty
  ram: Use memory_region_test_and_clear_dirty
  fix memory.c
  migration: Only go to the iterate stage if there is anything to send
  ram: optimize migration bitmap walking

Paolo Bonzini (1):
  split MRU ram list

Umesh Deshpande (2):
  add a version number to ram_list
  protect the ramlist with a separate mutex

 Makefile.objs     |   2 +-
 arch_init.c       | 133 +++++++++++--------
 block-migration.c |  49 ++-----
 block.c           |   6 +
 buffered_file.c   | 256 -----------------------------------
 buffered_file.h   |  22 ---
 cpu-all.h         |  13 +-
 cpus.c            |  17 +++
 exec.c            |  44 +++++-
 memory.c          |  17 +++
 memory.h          |  18 +++
 migration-exec.c  |   4 +-
 migration-fd.c    |   9 +-
 migration-tcp.c   |  21 +--
 migration-unix.c  |   4 +-
 migration.c       | 391 ++++++++++++++++++++++++++++++++++++++++--------------
 migration.h       |   4 +-
 qemu-file.h       |   5 -
 savevm.c          |  37 +++++-
 sysemu.h          |   1 +
 vmstate.h         |   1 +
 21 files changed, 522 insertions(+), 532 deletions(-)
 delete mode 100644 buffered_file.c
 delete mode 100644 buffered_file.h

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition
  2012-10-18  7:29 [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition Juan Quintela
@ 2012-10-18  9:00 ` Paolo Bonzini
  2012-10-26 13:04 ` Paolo Bonzini
  1 sibling, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2012-10-18  9:00 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

Il 18/10/2012 09:29, Juan Quintela ha scritto:
> v3:
> 
> This is work in progress on top of the previous migration series just sent.
> 
> - Introduces a thread for migration instead of using a timer and callback
> - remove the writting to the fd from the iothread lock
> - make the writes synchronous
> - Introduce a new pending method that returns how many bytes are pending for
>   one save live section
> - last patch just shows printfs to see where the time is being spent
>   on the migration complete phase.
>   (yes it pollutes all uses of stop on the monitor)
> 
> So far I have found that we spent a lot of time on bdrv_flush_all() It
> can take from 1ms to 600ms (yes, it is not a typo).  That dwarfs the
> migration default downtime time (30ms).
> 
> Stop all vcpus:
> 
> - it works now (after the changes on qemu_cpu_is_vcpu on the previous
>   series) caveat is that the time that brdv_flush_all() takes is
>   "unpredictable".  Any silver bullets?

You could reuse the "block" live migration item.  In block_save_pending,
start a bdrv_aio_flush() on all block devices that have already
completed the previous one.

But that's not a regression in the migration thread, isn't it?

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition
       [not found] <4168C988EBDF2141B4E0B6475B6A73D101904FFB@G6W2493.americas.hpqcorp.net>
@ 2012-10-24 13:49 ` Chegu Vinod
  2012-10-24 14:29   ` Chegu Vinod
  0 siblings, 1 reply; 7+ messages in thread
From: Chegu Vinod @ 2012-10-24 13:49 UTC (permalink / raw)
  To: qemu-devel, Juan Jose Quintela Carreira

[-- Attachment #1: Type: text/plain, Size: 7704 bytes --]

On 10/24/2012 6:40 AM, Vinod, Chegu wrote:
>
> Hi
>
> This series apply on top of the refactoring that I sent yesterday.
>
> Changes from the last version include:
>
> - buffered_file.c is gone, its functionality is merged in migration.c
>
>   special attention to the megre of buffered_file_thread() &
>
>   migration_file_put_notify().
>
> - Some more bitmap handling optimizations (thanks to Orit & Paolo for
>
>   suggestions and code and Vinod for testing)
>
> Please review.  Included is the pointer to the full tree.
>
> Thanks, Juan.
>
> The following changes since commit 
> b6348f29d033d5a8a26f633d2ee94362595f32a4:
>
>   target-arm/translate: Fix RRX operands (2012-10-17 19:56:46 +0200)
>
> are available in the git repository at:
>
> http://repo.or.cz/r/qemu/quintela.git migration-thread-20121017
>
> for you to fetch changes up to 486dabc29f56d8f0e692395d4a6cd483b3a77f01:
>
>   ram: optimize migration bitmap walking (2012-10-18 09:20:34 +0200)
>
> v3:
>
> This is work in progress on top of the previous migration series just 
> sent.
>
> - Introduces a thread for migration instead of using a timer and callback
>
> - remove the writting to the fd from the iothread lock
>
> - make the writes synchronous
>
> - Introduce a new pending method that returns how many bytes are 
> pending for
>
>   one save live section
>
> - last patch just shows printfs to see where the time is being spent
>
>   on the migration complete phase.
>
>   (yes it pollutes all uses of stop on the monitor)
>
> So far I have found that we spent a lot of time on bdrv_flush_all() It
>
> can take from 1ms to 600ms (yes, it is not a typo).  That dwarfs the
>
> migration default downtime time (30ms).
>
> Stop all vcpus:
>
> - it works now (after the changes on qemu_cpu_is_vcpu on the previous
>
>   series) caveat is that the time that brdv_flush_all() takes is
>
>   "unpredictable".  Any silver bullets?
>
>   Paolo suggested to call for migration completion phase:
>
>   bdrv_aio_flush_all();
>
>   Sent the dirty pages;
>
>   bdrv_drain_all()
>
>   brdv_flush_all()
>
>   another round through the bitmap in case that completions have
>
>   changed some page
>
>   Paolo, did I get it right?
>
>   Any other suggestion?
>
> - migrate_cancel() is not properly implemented (as in the film that we
>
>   take no locks, ...)
>
> - expected_downtime is not calculated.
>
>   I am about to merge migrate_fd_put_ready & buffered_thread() and
>
>   that would make trivial to calculate.
>
> It outputs something like:
>
> wakeup_request 0
>
> time cpu_disable_ticks 0
>
> time pause_all_vcpus 1
>
> time runstate_set 1
>
> time vmstate_notify 2
>
> time bdrv_drain_all 2
>
> time flush device
>
> /dev/disk/by-path/ip-192.168.10.200:3260-iscsi-iqn.2010-12.org.trasno:iscsi.lvm-lun-1:
>
> 3
>
> time flush device : 3
>
> time flush device : 3
>
> time flush device : 3
>
> time bdrv_flush_all 5
>
> time monitor_protocol_event 5
>
> vm_stop 2 5
>
> synchronize_all_states 1
>
> migrate RAM 37
>
> migrate rest devices 1
>
> complete without error 3a 44
>
> completed 45
>
> end completed stage 45
>
> As you can see, we estimate that we can sent all pending data in 30ms,
>
> it took 37ms to send the RAM (that is what we calculate).  So
>
> estimation is quite good.
>
> What it gives me lots of variation is on the line with device name of 
> "time
>
> flush device".
>
> That is what varies between 1ms to 600ms
>
> This is in a completely idle guest. I am running:
>
>         while (1) {
>
>                 uint64_t delay;
>
>                 if (gettimeofday(&t0, NULL) != 0)
>
> perror("gettimeofday 1");
>
>                 if (usleep(ms2us(10)) != 0)
>
> perror("usleep");
>
>                 if (gettimeofday(&t1, NULL) != 0)
>
> perror("gettimeofday 2");
>
>                 t1.tv_usec -= t0.tv_usec;
>
>                 if (t1.tv_usec < 0) {
>
>                         t1.tv_usec += 1000000;
>
>                         t1.tv_sec--;
>
>                 }
>
>                 t1.tv_sec -= t0.tv_sec;
>
>                 delay = t1.tv_sec * 1000 + t1.tv_usec/1000;
>
>                 if (delay > 100)
>
>                         printf("delay of %ld ms\n", delay);
>
>        }
>
> To see the latency inside the guest (i.e. ask for a 10ms sleep, and 
> see how
>
> long it takes).
>
> [root@d1 ~]# ./timer
>
> delay of 161 ms
>
> delay of 135 ms
>
> delay of 143 ms
>
> delay of 132 ms
>
> delay of 131 ms
>
> delay of 141 ms
>
> delay of 113 ms
>
> delay of 119 ms
>
> delay of 114 ms
>
> But that values are independent of migration.  Without even starting
>
> the migration, idle guest doing nothing, we get it sometimes.
>
> Juan Quintela (27):
>
>   buffered_file: Move from using a timer to use a thread
>
>   migration: make qemu_fopen_ops_buffered() return void
>
>   migration: stop all cpus correctly
>
>   migration: make writes blocking
>
>   migration: remove unfreeze logic
>
>   migration: take finer locking
>
>   buffered_file: Unfold the trick to restart generating migration data
>
>   buffered_file: don't flush on put buffer
>
>   buffered_file: unfold buffered_append in buffered_put_buffer
>
>   savevm: New save live migration method: pending
>
>   migration: include qemu-file.h
>
>   migration-fd: remove duplicate include
>
>   migration: move buffered_file.c code into migration.c
>
>   migration: move migration_fd_put_ready()
>
>   migration: Inline qemu_fopen_ops_buffered into migrate_fd_connect
>
>   migration: move migration notifier
>
>   migration: move begining stage to the migration thread
>
>   migration: move exit condition to migration thread
>
>   migration: unfold rest of migrate_fd_put_ready() into thread
>
>   migration: print times for end phase
>
>   ram: rename last_block to last_seen_block
>
>   ram: Add last_sent_block
>
>   memory: introduce memory_region_test_and_clear_dirty
>
>   ram: Use memory_region_test_and_clear_dirty
>
>   fix memory.c
>
>   migration: Only go to the iterate stage if there is anything to send
>
>   ram: optimize migration bitmap walking
>
> Paolo Bonzini (1):
>
>   split MRU ram list
>
> Umesh Deshpande (2):
>
>   add a version number to ram_list
>
>   protect the ramlist with a separate mutex
>
> Makefile.objs     |   2 +-
>
> arch_init.c       | 133 +++++++++++--------
>
> block-migration.c |  49 ++-----
>
> block.c           |   6 +
>
> buffered_file.c   | 256 -----------------------------------
>
> buffered_file.h   |  22 ---
>
> cpu-all.h         |  13 +-
>
> cpus.c            |  17 +++
>
> exec.c            |  44 +++++-
>
> memory.c          |  17 +++
>
> memory.h          |  18 +++
>
> migration-exec.c  |   4 +-
>
> migration-fd.c    |   9 +-
>
> migration-tcp.c   |  21 +--
>
> migration-unix.c  |   4 +-
>
> migration.c       | 391 
> ++++++++++++++++++++++++++++++++++++++++--------------
>
> migration.h       |   4 +-
>
> qemu-file.h       |   5 -
>
> savevm.c          |  37 +++++-
>
> sysemu.h          |   1 +
>
> vmstate.h         |   1 +
>
> 21 files changed, 522 insertions(+), 532 deletions(-)
>
> delete mode 100644 buffered_file.c
>
> delete mode 100644 buffered_file.h
>
> -- 
>
> 1.7.11.7
>


Tested-by: Chegu Vinod  <chegu_vinod@hp.com>


Using these patches 'have verified live migration (on x86_64 platforms) 
for guest sizes varying from 64G/10vcpus thru 768G/80vcpus and I have 
seen reduction in both the downtime as well as the total migration 
time.  The dirty bitmap optimizations have shown improvements too and 
have helped in the reduction of the downtime (perhaps more can be done 
as a next step..i.e. after the above changes (-minus the printf's) make 
it into upstream).  The new migration stats that were added were useful 
too !

Thanks
Vinod


Vinod


Vinod


[-- Attachment #2: Type: text/html, Size: 50721 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition
  2012-10-24 13:49 ` Chegu Vinod
@ 2012-10-24 14:29   ` Chegu Vinod
  0 siblings, 0 replies; 7+ messages in thread
From: Chegu Vinod @ 2012-10-24 14:29 UTC (permalink / raw)
  To: qemu-devel, Juan Jose Quintela Carreira

[-- Attachment #1: Type: text/plain, Size: 8654 bytes --]

On 10/24/2012 6:49 AM, Chegu Vinod wrote:
> On 10/24/2012 6:40 AM, Vinod, Chegu wrote:
>>
>> Hi
>>
>> This series apply on top of the refactoring that I sent yesterday.
>>
>> Changes from the last version include:
>>
>> - buffered_file.c is gone, its functionality is merged in migration.c
>>
>>   special attention to the megre of buffered_file_thread() &
>>
>>   migration_file_put_notify().
>>
>> - Some more bitmap handling optimizations (thanks to Orit & Paolo for
>>
>>   suggestions and code and Vinod for testing)
>>
>> Please review.  Included is the pointer to the full tree.
>>
>> Thanks, Juan.
>>
>> The following changes since commit 
>> b6348f29d033d5a8a26f633d2ee94362595f32a4:
>>
>>   target-arm/translate: Fix RRX operands (2012-10-17 19:56:46 +0200)
>>
>> are available in the git repository at:
>>
>> http://repo.or.cz/r/qemu/quintela.git migration-thread-20121017
>>
>> for you to fetch changes up to 486dabc29f56d8f0e692395d4a6cd483b3a77f01:
>>
>>   ram: optimize migration bitmap walking (2012-10-18 09:20:34 +0200)
>>
>> v3:
>>
>> This is work in progress on top of the previous migration series just 
>> sent.
>>
>> - Introduces a thread for migration instead of using a timer and callback
>>
>> - remove the writting to the fd from the iothread lock
>>
>> - make the writes synchronous
>>
>> - Introduce a new pending method that returns how many bytes are 
>> pending for
>>
>>   one save live section
>>
>> - last patch just shows printfs to see where the time is being spent
>>
>>   on the migration complete phase.
>>
>>   (yes it pollutes all uses of stop on the monitor)
>>
>> So far I have found that we spent a lot of time on bdrv_flush_all() It
>>
>> can take from 1ms to 600ms (yes, it is not a typo).  That dwarfs the
>>
>> migration default downtime time (30ms).
>>
>> Stop all vcpus:
>>
>> - it works now (after the changes on qemu_cpu_is_vcpu on the previous
>>
>>   series) caveat is that the time that brdv_flush_all() takes is
>>
>>   "unpredictable".  Any silver bullets?
>>
>>   Paolo suggested to call for migration completion phase:
>>
>>   bdrv_aio_flush_all();
>>
>>   Sent the dirty pages;
>>
>>   bdrv_drain_all()
>>
>>   brdv_flush_all()
>>
>>   another round through the bitmap in case that completions have
>>
>>   changed some page
>>
>>   Paolo, did I get it right?
>>
>>   Any other suggestion?
>>
>> - migrate_cancel() is not properly implemented (as in the film that we
>>
>>   take no locks, ...)
>>
>> - expected_downtime is not calculated.
>>
>>   I am about to merge migrate_fd_put_ready & buffered_thread() and
>>
>>   that would make trivial to calculate.
>>
>> It outputs something like:
>>
>> wakeup_request 0
>>
>> time cpu_disable_ticks 0
>>
>> time pause_all_vcpus 1
>>
>> time runstate_set 1
>>
>> time vmstate_notify 2
>>
>> time bdrv_drain_all 2
>>
>> time flush device
>>
>> /dev/disk/by-path/ip-192.168.10.200:3260-iscsi-iqn.2010-12.org.trasno:iscsi.lvm-lun-1:
>>
>> 3
>>
>> time flush device : 3
>>
>> time flush device : 3
>>
>> time flush device : 3
>>
>> time bdrv_flush_all 5
>>
>> time monitor_protocol_event 5
>>
>> vm_stop 2 5
>>
>> synchronize_all_states 1
>>
>> migrate RAM 37
>>
>> migrate rest devices 1
>>
>> complete without error 3a 44
>>
>> completed 45
>>
>> end completed stage 45
>>
>> As you can see, we estimate that we can sent all pending data in 30ms,
>>
>> it took 37ms to send the RAM (that is what we calculate).  So
>>
>> estimation is quite good.
>>
>> What it gives me lots of variation is on the line with device name of 
>> "time
>>
>> flush device".
>>
>> That is what varies between 1ms to 600ms
>>
>> This is in a completely idle guest.  I am running:
>>
>>         while (1) {
>>
>>                 uint64_t delay;
>>
>>                 if (gettimeofday(&t0, NULL) != 0)
>>
>> perror("gettimeofday 1");
>>
>>                 if (usleep(ms2us(10)) != 0)
>>
>> perror("usleep");
>>
>>                 if (gettimeofday(&t1, NULL) != 0)
>>
>> perror("gettimeofday 2");
>>
>>                 t1.tv_usec -= t0.tv_usec;
>>
>>                 if (t1.tv_usec < 0) {
>>
>>                         t1.tv_usec += 1000000;
>>
>> t1.tv_sec--;
>>
>>                 }
>>
>>                 t1.tv_sec -= t0.tv_sec;
>>
>>                 delay = t1.tv_sec * 1000 + t1.tv_usec/1000;
>>
>>                 if (delay > 100)
>>
>> printf("delay of %ld ms\n", delay);
>>
>>        }
>>
>> To see the latency inside the guest (i.e. ask for a 10ms sleep, and 
>> see how
>>
>> long it takes).
>>
>> [root@d1 ~]# ./timer
>>
>> delay of 161 ms
>>
>> delay of 135 ms
>>
>> delay of 143 ms
>>
>> delay of 132 ms
>>
>> delay of 131 ms
>>
>> delay of 141 ms
>>
>> delay of 113 ms
>>
>> delay of 119 ms
>>
>> delay of 114 ms
>>
>> But that values are independent of migration.  Without even starting
>>
>> the migration, idle guest doing nothing, we get it sometimes.
>>
>> Juan Quintela (27):
>>
>>   buffered_file: Move from using a timer to use a thread
>>
>>   migration: make qemu_fopen_ops_buffered() return void
>>
>>   migration: stop all cpus correctly
>>
>>   migration: make writes blocking
>>
>>   migration: remove unfreeze logic
>>
>>   migration: take finer locking
>>
>>   buffered_file: Unfold the trick to restart generating migration data
>>
>>   buffered_file: don't flush on put buffer
>>
>>   buffered_file: unfold buffered_append in buffered_put_buffer
>>
>>   savevm: New save live migration method: pending
>>
>>   migration: include qemu-file.h
>>
>>   migration-fd: remove duplicate include
>>
>>   migration: move buffered_file.c code into migration.c
>>
>>   migration: move migration_fd_put_ready()
>>
>>   migration: Inline qemu_fopen_ops_buffered into migrate_fd_connect
>>
>>   migration: move migration notifier
>>
>>   migration: move begining stage to the migration thread
>>
>>   migration: move exit condition to migration thread
>>
>>   migration: unfold rest of migrate_fd_put_ready() into thread
>>
>>   migration: print times for end phase
>>
>>   ram: rename last_block to last_seen_block
>>
>>   ram: Add last_sent_block
>>
>>   memory: introduce memory_region_test_and_clear_dirty
>>
>>   ram: Use memory_region_test_and_clear_dirty
>>
>>   fix memory.c
>>
>>   migration: Only go to the iterate stage if there is anything to send
>>
>>   ram: optimize migration bitmap walking
>>
>> Paolo Bonzini (1):
>>
>>   split MRU ram list
>>
>> Umesh Deshpande (2):
>>
>>   add a version number to ram_list
>>
>>   protect the ramlist with a separate mutex
>>
>> Makefile.objs     |   2 +-
>>
>> arch_init.c       | 133 +++++++++++--------
>>
>> block-migration.c |  49 ++-----
>>
>> block.c           |   6 +
>>
>> buffered_file.c   | 256 -----------------------------------
>>
>> buffered_file.h   |  22 ---
>>
>> cpu-all.h         |  13 +-
>>
>> cpus.c            |  17 +++
>>
>> exec.c            |  44 +++++-
>>
>> memory.c          |  17 +++
>>
>> memory.h          |  18 +++
>>
>> migration-exec.c  |   4 +-
>>
>> migration-fd.c    |   9 +-
>>
>> migration-tcp.c   |  21 +--
>>
>> migration-unix.c  |   4 +-
>>
>> migration.c       | 391 
>> ++++++++++++++++++++++++++++++++++++++++--------------
>>
>> migration.h       |   4 +-
>>
>> qemu-file.h       |   5 -
>>
>> savevm.c          |  37 +++++-
>>
>> sysemu.h          |   1 +
>>
>> vmstate.h         |   1 +
>>
>> 21 files changed, 522 insertions(+), 532 deletions(-)
>>
>> delete mode 100644 buffered_file.c
>>
>> delete mode 100644 buffered_file.h
>>
>> -- 
>>
>> 1.7.11.7
>>
>
>
> Tested-by: Chegu Vinod <chegu_vinod@hp.com>
>
>
> Using these patches 'have verified live migration (on x86_64 
> platforms) for guest sizes varying from 64G/10vcpus thru 768G/80vcpus 
> and I have seen reduction in both the downtime as well as the total 
> migration time.  The dirty bitmap optimizations have shown 
> improvements too and have helped in the reduction of the downtime 
> (perhaps more can be done as a next step..i.e. after the above changes 
> (-minus the printf's) make it into upstream). The new migration stats 
> that were added were useful too !
>
> Thanks
> Vinod
>

Wanted to follow up on and issue that I had observed...  <Already shared 
this with Juan/Orit/Paolo but forgot to mention it in the email above!>

As mentioned above for larger (>= 256G ) sized guests the cost of dirty 
bitmap synch up is high. During the very start of the migration i.e. in 
ram_save_setup() ...noticed that a lot of time was being spent in 
synching up the dirty bitmaps  etc. (and also perhaps marking the pages 
as dirty etc)....this leads to a multiple second freeze on the guest. As 
part of optimizing the dirty bitmap synch up this issue needs to be 
addressed.

Thanks
Vinod

[-- Attachment #2: Type: text/html, Size: 53108 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition
  2012-10-18  7:29 [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition Juan Quintela
  2012-10-18  9:00 ` Paolo Bonzini
@ 2012-10-26 13:04 ` Paolo Bonzini
  1 sibling, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2012-10-26 13:04 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Anthony Liguori, qemu-devel

Il 18/10/2012 09:29, Juan Quintela ha scritto:
> Hi
> 
> This series apply on top of the refactoring that I sent yesterday.
> Changes from the last version include:
> 
> - buffered_file.c is gone, its functionality is merged in migration.c
>   special attention to the megre of buffered_file_thread() &
>   migration_file_put_notify().
> 
> - Some more bitmap handling optimizations (thanks to Orit & Paolo for
>   suggestions and code and Vinod for testing)
> 
> Please review.  Included is the pointer to the full tree.

Anthony, I think patches 1-13 are ready to go (but only if Juan reviews
my incoming-migration-in-a-coroutine first ;)).

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-10-26 13:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-18  7:26 [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition Juan Quintela
2012-10-18  7:26 ` [Qemu-devel] [PATCH 01/72] fix virtfs Juan Quintela
  -- strict thread matches above, loose matches on Subject: below --
2012-10-18  7:29 [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition Juan Quintela
2012-10-18  9:00 ` Paolo Bonzini
2012-10-26 13:04 ` Paolo Bonzini
     [not found] <4168C988EBDF2141B4E0B6475B6A73D101904FFB@G6W2493.americas.hpqcorp.net>
2012-10-24 13:49 ` Chegu Vinod
2012-10-24 14:29   ` Chegu Vinod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).