Re: [Qemu-devel] Fwd: [RFC 00/27] Migration thread (WIP)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Chegu Vinod <chegu_vinod@hp.com>
To: qemu-devel@nongnu.org
Cc: Orit Wasserman <owasserm@redhat.com>,
	Juan Jose Quintela Carreira <quintela@redhat.com>
Subject: Re: [Qemu-devel] Fwd:  [RFC 00/27] Migration thread (WIP)
Date: Thu, 26 Jul 2012 11:41:09 -0700	[thread overview]
Message-ID: <50118F45.6050909@hp.com> (raw)
In-Reply-To: <500EF579.5040607@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 7442 bytes --]



> -------- Original Message --------
> Subject: 	[Qemu-devel] [RFC 00/27] Migration thread (WIP)
> Date: 	Tue, 24 Jul 2012 20:36:25 +0200
> From: 	Juan Quintela <quintela@redhat.com>
> To: 	qemu-devel@nongnu.org
>
>
>
> Hi
>
> This series are on top of the migration-next-v5 series just posted.
>
> First of all, this is an RFC/Work in progress.  Just a lot of people
> asked for it, and I would like review of the design.
Hello,

Thanks for sharing this early/WIP version for evaluation.

Still in the middle of  code review..but wanted to share a couple of 
quick  observations.
'tried to use it to migrate a 128G/10VCPU guest (speed set to 10G and 
downtime 2s).
Once with no workload (i.e. idle guest) and the second was with a 
SpecJBB running in the guest.

The idle guest case seemed to migrate fine...


capabilities: xbzrle: off
Migration status: completed
transferred ram: 3811345 kbytes
remaining ram: 0 kbytes
total ram: 134226368 kbytes
total time: 199743 milliseconds


In the case of the SpecJBB I ran into issues during stage 3...the source 
host's qemu and the guest hung. I need to debug this more... (if  
already have some hints pl. let me know.).


capabilities: xbzrle: off
Migration status: active
transferred ram: 127618578 kbytes
remaining ram: 2386832 kbytes
total ram: 134226368 kbytes
total time: 526139 milliseconds
(qemu) qemu_savevm_state_complete called
qemu_savevm_state_complete calling ram_save_complete

<---  hung somewhere after this ('need to get more info).


---

As with the non-migration-thread version the Specjbb workload completed 
before the migration attempted to move to stage 3 (i.e. didn't converge 
while the workload was still active).

BTW, with this version of the bits (i.e. while running SpecJBB which is 
supposed to dirty quite a bit of memory) I noticed that there wasn't 
much change in the b/w usage of the dedicated 10Gb private network link 
(It was still < ~1.5-3.0Gb/sec).   Expected this to be a little better 
since we have a separate thread...  not sure what else is in play here ? 
(numa locality of where the migration thread runs or something other 
basic tuning in the implementation ?)

'have a hi-level design question... (perhaps folks have already thought 
about it..and categorized it as potential future optimization..?)

Would it be possible to off load the iothread completely [from all 
migration related activity] and have one thread (with the appropriate 
protection) get involved with getting the list of the dirty pages ? Have 
one or more threads dedicated for trying to push multiple streams of 
data to saturate the allocated network bandwidth ?  This may help in 
large + busy guests. Comments?    There  are perhaps other implications 
of doing all of this (like burning more host cpu cycles) but perhaps 
this can be configurable based on user's needs... e.g. fewer but large 
guests on a host with no over subscription.

Thanks
Vinod


> It does:
> - get a new bitmap for migration, and that bitmap uses 1 bit by page
> - it unfolds migration_buffered_file.  Only one user existed.
> - it simplifies buffered_file a lot.
>
> - About the migration thread, special attention was giving to try to
>    get the series review-able (reviewers would tell if I got it).
>
> Basic design:
> - we create a new thread instead of a timer function
> - we move all the migration work to that thread (but run everything
>    except the waits with the iothread lock.
> - we move all the writting to outside the iothread lock.  i.e.
>    we walk the state with the iothread hold, and copy everything to one buffer.
>    then we write that buffer to the sockets outside the iothread lock.
> - once here, we move to writting synchronously to the sockets.
> - this allows us to simplify quite a lot.
>
> And basically, that is it.  Notice that we still do the iterate page
> walking with the iothread held.  Light testing show that we got
> similar speed and latencies than without the thread (notice that
> almost no optimizations done here yet).
>
> Appart of the review:
> - Are there any locking issues that I have missed (I guess so)
> - stop all cpus correctly.  vm_stop should be called from the iothread,
>    I use the trick of using a bottom half to get that working correctly.
>    but this _implementation_ is ugly as hell.  Is there an easy way
>    of doing it?
> - Do I really have to export last_ram_offset(), there is no other way
>    of knowing the ammount of RAM?
>
> Known issues:
>
> - for some reason, when it has to start a 2nd round of bitmap
>    handling, it decides to dirty all pages.  Haven't found still why
>    this happens.
>
> If you can test it, and said me where it breaks, it would also help.
>
> Work is based on Umesh thread work, and work that Paolo Bonzini had
> work on top of that.  All the mirgation thread was done from scratch
> becase I was unable to debug why it was failing, but it "owes" a lot
> to the previous design.
>
> Thanks in advance, Juan.
>
> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:
>
>    Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23 13:15:34 -0500)
>
> are available in the git repository at:
>
>
>    http://repo.or.cz/r/qemu/quintela.git  migration-thread-v1
>
> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:
>
>    buffered_file: unfold buffered_append in buffered_put_buffer (2012-07-24 16:46:13 +0200)
>
>
> Juan Quintela (23):
>    buffered_file: g_realloc() can't fail
>    savevm: Factorize ram globals reset in its own function
>    ram: introduce migration_bitmap_set_dirty()
>    ram: Introduce migration_bitmap_test_and_reset_dirty()
>    ram: Export last_ram_offset()
>    ram: introduce migration_bitmap_sync()
>    Separate migration bitmap
>    buffered_file: rename opaque to migration_state
>    buffered_file: opaque is MigrationState
>    buffered_file: unfold migrate_fd_put_buffer
>    buffered_file: unfold migrate_fd_put_ready
>    buffered_file: unfold migrate_fd_put_buffer
>    buffered_file: unfold migrate_fd_put_buffer
>    buffered_file: We can access directly to bandwidth_limit
>    buffered_file: Move from using a timer to use a thread
>    migration: make qemu_fopen_ops_buffered() return void
>    migration: stop all cpus correctly
>    migration: make writes blocking
>    migration: remove unfreeze logic
>    migration: take finer locking
>    buffered_file: Unfold the trick to restart generating migration data
>    buffered_file: don't flush on put buffer
>    buffered_file: unfold buffered_append in buffered_put_buffer
>
> Paolo Bonzini (2):
>    split MRU ram list
>    BufferedFile: append, then flush
>
> Umesh Deshpande (2):
>    add a version number to ram_list
>    protect the ramlist with a separate mutex
>
>   arch_init.c      |  108 +++++++++++++++++++++++++-------
>   buffered_file.c  |  179 +++++++++++++++++-------------------------------------
>   buffered_file.h  |   12 +---
>   cpu-all.h        |   17 +++++-
>   exec-obsolete.h  |   10 ---
>   exec.c           |   45 +++++++++++---
>   migration-exec.c |    2 -
>   migration-fd.c   |    6 --
>   migration-tcp.c  |    2 +-
>   migration-unix.c |    2 -
>   migration.c      |  111 ++++++++++++++-------------------
>   migration.h      |    6 ++
>   qemu-file.h      |    5 --
>   savevm.c         |    5 --
>   14 files changed, 249 insertions(+), 261 deletions(-)
>
> -- 
> 1.7.10.4
>
>
>
>
>



[-- Attachment #2: Type: text/html, Size: 9329 bytes --]

next prev parent reply	other threads:[~2012-07-26 18:41 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 01/27] buffered_file: g_realloc() can't fail Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 02/27] split MRU ram list Juan Quintela
2012-07-25 20:20   ` Michael Roth
2012-07-26 13:19     ` Avi Kivity
2012-07-24 18:36 ` [Qemu-devel] [PATCH 03/27] savevm: Factorize ram globals reset in its own function Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 04/27] add a version number to ram_list Juan Quintela
2012-07-25 23:27   ` Michael Roth
2012-07-26  9:19     ` Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 05/27] protect the ramlist with a separate mutex Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 06/27] ram: introduce migration_bitmap_set_dirty() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 07/27] ram: Introduce migration_bitmap_test_and_reset_dirty() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 08/27] ram: Export last_ram_offset() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 09/27] ram: introduce migration_bitmap_sync() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 10/27] Separate migration bitmap Juan Quintela
2012-07-25  9:16   ` Avi Kivity
2012-07-26  9:22     ` Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 11/27] BufferedFile: append, then flush Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 12/27] buffered_file: rename opaque to migration_state Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 13/27] buffered_file: opaque is MigrationState Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 14/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 15/27] buffered_file: unfold migrate_fd_put_ready Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 16/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 17/27] " Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 18/27] buffered_file: We can access directly to bandwidth_limit Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 19/27] buffered_file: Move from using a timer to use a thread Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 20/27] migration: make qemu_fopen_ops_buffered() return void Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 21/27] migration: stop all cpus correctly Juan Quintela
2012-07-26 12:54   ` Eric Blake
2012-07-24 18:36 ` [Qemu-devel] [PATCH 22/27] migration: make writes blocking Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 23/27] migration: remove unfreeze logic Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 24/27] migration: take finer locking Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 25/27] buffered_file: Unfold the trick to restart generating migration data Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 26/27] buffered_file: don't flush on put buffer Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 27/27] buffered_file: unfold buffered_append in buffered_put_buffer Juan Quintela
2012-07-25  9:55 ` [Qemu-devel] [RFC 00/27] Migration thread (WIP) Orit Wasserman
2012-07-26 10:57 ` Jan Kiszka
2012-07-26 11:16   ` Juan Quintela
2012-07-26 11:56     ` Jan Kiszka
     [not found] ` <500EF579.5040607@redhat.com>
2012-07-26 18:41   ` Chegu Vinod [this message]
2012-07-26 21:26     ` [Qemu-devel] Fwd: " Chegu Vinod
2012-07-27 11:05       ` Juan Quintela
     [not found]         ` <4168C988EBDF2141B4E0B6475B6A73D1165CDD@G6W2493.americas.hpqcorp.net>
2012-07-27 14:21           ` [Qemu-devel] FW: " Chegu Vinod
2012-07-26 21:36 ` [Qemu-devel] " Michael Roth
2012-08-02 12:01   ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50118F45.6050909@hp.com \
    --to=chegu_vinod@hp.com \
    --cc=owasserm@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).