From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:51980) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SulPy-0002yK-Kf for qemu-devel@nongnu.org; Fri, 27 Jul 2012 10:21:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SulPo-0006zb-TQ for qemu-devel@nongnu.org; Fri, 27 Jul 2012 10:21:26 -0400 Received: from g4t0017.houston.hp.com ([15.201.24.20]:26250) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SulPo-0006zP-Kl for qemu-devel@nongnu.org; Fri, 27 Jul 2012 10:21:16 -0400 Message-ID: <5012A3DA.20904@hp.com> Date: Fri, 27 Jul 2012 07:21:14 -0700 From: Chegu Vinod MIME-Version: 1.0 References: <1343155012-26316-1-git-send-email-quintela@redhat.com> <500EF579.5040607@redhat.com> <50118F45.6050909@hp.com> <5011B5EB.7080209@hp.com> <87zk6l5ur4.fsf@trasno.org> <4168C988EBDF2141B4E0B6475B6A73D1165CDD@G6W2493.americas.hpqcorp.net> In-Reply-To: <4168C988EBDF2141B4E0B6475B6A73D1165CDD@G6W2493.americas.hpqcorp.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] FW: Fwd: [RFC 00/27] Migration thread (WIP) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Jose Quintela Carreira Cc: Orit Wasserman , qemu-devel@nongnu.org On 7/27/2012 7:11 AM, Vinod, Chegu wrote: > > -----Original Message----- > From: Juan Quintela [mailto:quintela@redhat.com] > Sent: Friday, July 27, 2012 4:06 AM > To: Vinod, Chegu > Cc: qemu-devel@nongnu.org; Orit Wasserman > Subject: Re: Fwd: [RFC 00/27] Migration thread (WIP) > > Chegu Vinod wrote: >> On 7/26/2012 11:41 AM, Chegu Vinod wrote: >> >> >> >> >> -------- Original Message -------- >> >> >> Subject: [Qemu-devel] [RFC 00/27] Migration thread (WIP) >> >> Date: Tue, 24 Jul 2012 20:36:25 +0200 >> >> From: Juan Quintela >> >> To: qemu-devel@nongnu.org >> >> >> >> >> Hi >> >> This series are on top of the migration-next-v5 series just posted. >> >> First of all, this is an RFC/Work in progress. Just a lot of people >> asked for it, and I would like review of the design. >> >> Hello, >> >> Thanks for sharing this early/WIP version for evaluation. >> >> Still in the middle of code review..but wanted to share a couple >> of quick observations. >> 'tried to use it to migrate a 128G/10VCPU guest (speed set to 10G >> and downtime 2s). >> Once with no workload (i.e. idle guest) and the second was with a >> SpecJBB running in the guest. >> >> The idle guest case seemed to migrate fine... >> >> >> capabilities: xbzrle: off >> Migration status: completed >> transferred ram: 3811345 kbytes >> remaining ram: 0 kbytes >> total ram: 134226368 kbytes >> total time: 199743 milliseconds >> >> >> In the case of the SpecJBB I ran into issues during stage 3...the >> source host's qemu and the guest hung. I need to debug this >> more... (if already have some hints pl. let me know.). >> >> >> capabilities: xbzrle: off >> Migration status: active >> transferred ram: 127618578 kbytes >> remaining ram: 2386832 kbytes >> total ram: 134226368 kbytes >> total time: 526139 milliseconds >> (qemu) qemu_savevm_state_complete called >> qemu_savevm_state_complete calling ram_save_complete >> >> <--- hung somewhere after this ('need to get more info). >> >> >> >> >> Appears to be some race condition...as there are cases when it hangs >> and in some cases it succeeds. > Weird guess, try to use less vcpus, same ram. Ok..will try that. > The way that we stop cpus is _hacky_ to say it somewhere. Will try to think about that part. Ok. > Thanks for the testing. All my testing has been done with 8GB guests and 2vcps. Will try with more vcpus to see if it makes a difference. > > > > >> (qemu) info migrate >> capabilities: xbzrle: off >> Migration status: completed >> transferred ram: 129937687 kbytes >> remaining ram: 0 kbytes >> total ram: 134226368 kbytes >> total time: 543228 milliseconds > Humm, _that_ is more strange. This means that it finished. There are cases where the migration is finishing just fine... even with larger guest configurations (256G/20VCPUs). > Could you run qemu under gdb and sent me the stack traces? > > I don't know your gdb thread kung-fu, so here are the instructions just in case: > > gdb --args C-c to break when it hangs (gdb)info threads you see all the threads running (gdb)thread 1 or whatever other number (gdb)bt the backtrace of that thread. The hang is intermittent... I ran it 4-5 times (under gdb) just now and I didn't see the issue :-( > I am specially interested in the backtrace of the migration thread and of the iothread. Will keep re-trying with different configs. and see if i get lucky in reproducing it (under gdb). Vinod > > Thanks, Juan. > >> Need to review/debug... >> >> Vinod >> >> >> >> --- >> >> As with the non-migration-thread version the Specjbb workload >> completed before the migration attempted to move to stage 3 (i.e. >> didn't converge while the workload was still active). >> >> BTW, with this version of the bits (i.e. while running SpecJBB >> which is supposed to dirty quite a bit of memory) I noticed that >> there wasn't much change in the b/w usage of the dedicated 10Gb >> private network link (It was still < ~1.5-3.0Gb/sec). Expected >> this to be a little better since we have a separate thread... not >> sure what else is in play here ? (numa locality of where the >> migration thread runs or something other basic tuning in the >> implementation ?) >> >> 'have a hi-level design question... (perhaps folks have already >> thought about it..and categorized it as potential future >> optimization..?) >> >> Would it be possible to off load the iothread completely [from all >> migration related activity] and have one thread (with the >> appropriate protection) get involved with getting the list of the >> dirty pages ? Have one or more threads dedicated for trying to >> push multiple streams of data to saturate the allocated network >> bandwidth ? This may help in large + busy guests. Comments? >> There are perhaps other implications of doing all of this (like >> burning more host cpu cycles) but perhaps this can be configurable >> based on user's needs... e.g. fewer but large guests on a host >> with no over subscription. >> >> Thanks >> Vinod >> >> >> >> >> It does: >> - get a new bitmap for migration, and that bitmap uses 1 bit by page >> - it unfolds migration_buffered_file. Only one user existed. >> - it simplifies buffered_file a lot. >> >> - About the migration thread, special attention was giving to try to >> get the series review-able (reviewers would tell if I got it). >> >> Basic design: >> - we create a new thread instead of a timer function >> - we move all the migration work to that thread (but run everything >> except the waits with the iothread lock. >> - we move all the writting to outside the iothread lock. i.e. >> we walk the state with the iothread hold, and copy everything to one buffer. >> then we write that buffer to the sockets outside the iothread lock. >> - once here, we move to writting synchronously to the sockets. >> - this allows us to simplify quite a lot. >> >> And basically, that is it. Notice that we still do the iterate page >> walking with the iothread held. Light testing show that we got >> similar speed and latencies than without the thread (notice that >> almost no optimizations done here yet). >> >> Appart of the review: >> - Are there any locking issues that I have missed (I guess so) >> - stop all cpus correctly. vm_stop should be called from the iothread, >> I use the trick of using a bottom half to get that working correctly. >> but this _implementation_ is ugly as hell. Is there an easy way >> of doing it? >> - Do I really have to export last_ram_offset(), there is no other way >> of knowing the ammount of RAM? >> >> Known issues: >> >> - for some reason, when it has to start a 2nd round of bitmap >> handling, it decides to dirty all pages. Haven't found still why >> this happens. >> >> If you can test it, and said me where it breaks, it would also help. >> >> Work is based on Umesh thread work, and work that Paolo Bonzini had >> work on top of that. All the mirgation thread was done from scratch >> becase I was unable to debug why it was failing, but it "owes" a lot >> to the previous design. >> >> Thanks in advance, Juan. >> >> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb: >> >> Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23 >> 13:15:34 -0500) >> >> are available in the git repository at: >> >> >> http://repo.or.cz/r/qemu/quintela.git migration-thread-v1 >> >> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100: >> >> buffered_file: unfold buffered_append in buffered_put_buffer >> (2012-07-24 16:46:13 +0200) >> >> >> Juan Quintela (23): >> buffered_file: g_realloc() can't fail >> savevm: Factorize ram globals reset in its own function >> ram: introduce migration_bitmap_set_dirty() >> ram: Introduce migration_bitmap_test_and_reset_dirty() >> ram: Export last_ram_offset() >> ram: introduce migration_bitmap_sync() >> Separate migration bitmap >> buffered_file: rename opaque to migration_state >> buffered_file: opaque is MigrationState >> buffered_file: unfold migrate_fd_put_buffer >> buffered_file: unfold migrate_fd_put_ready >> buffered_file: unfold migrate_fd_put_buffer >> buffered_file: unfold migrate_fd_put_buffer >> buffered_file: We can access directly to bandwidth_limit >> buffered_file: Move from using a timer to use a thread >> migration: make qemu_fopen_ops_buffered() return void >> migration: stop all cpus correctly >> migration: make writes blocking >> migration: remove unfreeze logic >> migration: take finer locking >> buffered_file: Unfold the trick to restart generating migration data >> buffered_file: don't flush on put buffer >> buffered_file: unfold buffered_append in buffered_put_buffer >> >> Paolo Bonzini (2): >> split MRU ram list >> BufferedFile: append, then flush >> >> Umesh Deshpande (2): >> add a version number to ram_list >> protect the ramlist with a separate mutex >> >> arch_init.c | 108 +++++++++++++++++++++++++------- >> buffered_file.c | 179 +++++++++++++++++------------------------------------- >> buffered_file.h | 12 +--- >> cpu-all.h | 17 +++++- >> exec-obsolete.h | 10 --- >> exec.c | 45 +++++++++++--- >> migration-exec.c | 2 - >> migration-fd.c | 6 -- >> migration-tcp.c | 2 +- >> migration-unix.c | 2 - >> migration.c | 111 ++++++++++++++------------------- >> migration.h | 6 ++ >> qemu-file.h | 5 -- >> savevm.c | 5 -- >> 14 files changed, 249 insertions(+), 261 deletions(-)