On 7/26/2012 11:41 AM, Chegu Vinod wrote: > > >> -------- Original Message -------- >> Subject: [Qemu-devel] [RFC 00/27] Migration thread (WIP) >> Date: Tue, 24 Jul 2012 20:36:25 +0200 >> From: Juan Quintela >> To: qemu-devel@nongnu.org >> >> >> >> Hi >> >> This series are on top of the migration-next-v5 series just posted. >> >> First of all, this is an RFC/Work in progress. Just a lot of people >> asked for it, and I would like review of the design. > Hello, > > Thanks for sharing this early/WIP version for evaluation. > > Still in the middle of code review..but wanted to share a couple of > quick observations. > 'tried to use it to migrate a 128G/10VCPU guest (speed set to 10G and > downtime 2s). > Once with no workload (i.e. idle guest) and the second was with a > SpecJBB running in the guest. > > The idle guest case seemed to migrate fine... > > > capabilities: xbzrle: off > Migration status: completed > transferred ram: 3811345 kbytes > remaining ram: 0 kbytes > total ram: 134226368 kbytes > total time: 199743 milliseconds > > > In the case of the SpecJBB I ran into issues during stage 3...the > source host's qemu and the guest hung. I need to debug this more... > (if already have some hints pl. let me know.). > > > capabilities: xbzrle: off > Migration status: active > transferred ram: 127618578 kbytes > remaining ram: 2386832 kbytes > total ram: 134226368 kbytes > total time: 526139 milliseconds > (qemu) qemu_savevm_state_complete called > qemu_savevm_state_complete calling ram_save_complete > > <--- hung somewhere after this ('need to get more info). > Appears to be some race condition...as there are cases when it hangs and in some cases it succeeds. (qemu) info migrate capabilities: xbzrle: off Migration status: completed transferred ram: 129937687 kbytes remaining ram: 0 kbytes total ram: 134226368 kbytes total time: 543228 milliseconds Need to review/debug... Vinod > > --- > > As with the non-migration-thread version the Specjbb workload > completed before the migration attempted to move to stage 3 (i.e. > didn't converge while the workload was still active). > > BTW, with this version of the bits (i.e. while running SpecJBB which > is supposed to dirty quite a bit of memory) I noticed that there > wasn't much change in the b/w usage of the dedicated 10Gb private > network link (It was still < ~1.5-3.0Gb/sec). Expected this to be a > little better since we have a separate thread... not sure what else > is in play here ? (numa locality of where the migration thread runs or > something other basic tuning in the implementation ?) > > 'have a hi-level design question... (perhaps folks have already > thought about it..and categorized it as potential future optimization..?) > > Would it be possible to off load the iothread completely [from all > migration related activity] and have one thread (with the appropriate > protection) get involved with getting the list of the dirty pages ? > Have one or more threads dedicated for trying to push multiple streams > of data to saturate the allocated network bandwidth ? This may help > in large + busy guests. Comments? There are perhaps other > implications of doing all of this (like burning more host cpu cycles) > but perhaps this can be configurable based on user's needs... e.g. > fewer but large guests on a host with no over subscription. > > Thanks > Vinod > > >> It does: >> - get a new bitmap for migration, and that bitmap uses 1 bit by page >> - it unfolds migration_buffered_file. Only one user existed. >> - it simplifies buffered_file a lot. >> >> - About the migration thread, special attention was giving to try to >> get the series review-able (reviewers would tell if I got it). >> >> Basic design: >> - we create a new thread instead of a timer function >> - we move all the migration work to that thread (but run everything >> except the waits with the iothread lock. >> - we move all the writting to outside the iothread lock. i.e. >> we walk the state with the iothread hold, and copy everything to one buffer. >> then we write that buffer to the sockets outside the iothread lock. >> - once here, we move to writting synchronously to the sockets. >> - this allows us to simplify quite a lot. >> >> And basically, that is it. Notice that we still do the iterate page >> walking with the iothread held. Light testing show that we got >> similar speed and latencies than without the thread (notice that >> almost no optimizations done here yet). >> >> Appart of the review: >> - Are there any locking issues that I have missed (I guess so) >> - stop all cpus correctly. vm_stop should be called from the iothread, >> I use the trick of using a bottom half to get that working correctly. >> but this _implementation_ is ugly as hell. Is there an easy way >> of doing it? >> - Do I really have to export last_ram_offset(), there is no other way >> of knowing the ammount of RAM? >> >> Known issues: >> >> - for some reason, when it has to start a 2nd round of bitmap >> handling, it decides to dirty all pages. Haven't found still why >> this happens. >> >> If you can test it, and said me where it breaks, it would also help. >> >> Work is based on Umesh thread work, and work that Paolo Bonzini had >> work on top of that. All the mirgation thread was done from scratch >> becase I was unable to debug why it was failing, but it "owes" a lot >> to the previous design. >> >> Thanks in advance, Juan. >> >> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb: >> >> Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23 13:15:34 -0500) >> >> are available in the git repository at: >> >> >> http://repo.or.cz/r/qemu/quintela.git migration-thread-v1 >> >> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100: >> >> buffered_file: unfold buffered_append in buffered_put_buffer (2012-07-24 16:46:13 +0200) >> >> >> Juan Quintela (23): >> buffered_file: g_realloc() can't fail >> savevm: Factorize ram globals reset in its own function >> ram: introduce migration_bitmap_set_dirty() >> ram: Introduce migration_bitmap_test_and_reset_dirty() >> ram: Export last_ram_offset() >> ram: introduce migration_bitmap_sync() >> Separate migration bitmap >> buffered_file: rename opaque to migration_state >> buffered_file: opaque is MigrationState >> buffered_file: unfold migrate_fd_put_buffer >> buffered_file: unfold migrate_fd_put_ready >> buffered_file: unfold migrate_fd_put_buffer >> buffered_file: unfold migrate_fd_put_buffer >> buffered_file: We can access directly to bandwidth_limit >> buffered_file: Move from using a timer to use a thread >> migration: make qemu_fopen_ops_buffered() return void >> migration: stop all cpus correctly >> migration: make writes blocking >> migration: remove unfreeze logic >> migration: take finer locking >> buffered_file: Unfold the trick to restart generating migration data >> buffered_file: don't flush on put buffer >> buffered_file: unfold buffered_append in buffered_put_buffer >> >> Paolo Bonzini (2): >> split MRU ram list >> BufferedFile: append, then flush >> >> Umesh Deshpande (2): >> add a version number to ram_list >> protect the ramlist with a separate mutex >> >> arch_init.c | 108 +++++++++++++++++++++++++------- >> buffered_file.c | 179 +++++++++++++++++------------------------------------- >> buffered_file.h | 12 +--- >> cpu-all.h | 17 +++++- >> exec-obsolete.h | 10 --- >> exec.c | 45 +++++++++++--- >> migration-exec.c | 2 - >> migration-fd.c | 6 -- >> migration-tcp.c | 2 +- >> migration-unix.c | 2 - >> migration.c | 111 ++++++++++++++------------------- >> migration.h | 6 ++ >> qemu-file.h | 5 -- >> savevm.c | 5 -- >> 14 files changed, 249 insertions(+), 261 deletions(-) >> >> -- >> 1.7.10.4 >> >> >> >> >> > >