From: Chegu Vinod <chegu_vinod@hp.com>
To: Juan Jose Quintela Carreira <quintela@redhat.com>
Cc: Orit Wasserman <owasserm@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] FW: Fwd: [RFC 00/27] Migration thread (WIP)
Date: Fri, 27 Jul 2012 07:21:14 -0700 [thread overview]
Message-ID: <5012A3DA.20904@hp.com> (raw)
In-Reply-To: <4168C988EBDF2141B4E0B6475B6A73D1165CDD@G6W2493.americas.hpqcorp.net>
On 7/27/2012 7:11 AM, Vinod, Chegu wrote:
>
> -----Original Message-----
> From: Juan Quintela [mailto:quintela@redhat.com]
> Sent: Friday, July 27, 2012 4:06 AM
> To: Vinod, Chegu
> Cc: qemu-devel@nongnu.org; Orit Wasserman
> Subject: Re: Fwd: [RFC 00/27] Migration thread (WIP)
>
> Chegu Vinod <chegu_vinod@hp.com> wrote:
>> On 7/26/2012 11:41 AM, Chegu Vinod wrote:
>>
>>
>>
>>
>> -------- Original Message --------
>>
>>
>> Subject: [Qemu-devel] [RFC 00/27] Migration thread (WIP)
>>
>> Date: Tue, 24 Jul 2012 20:36:25 +0200
>>
>> From: Juan Quintela <quintela@redhat.com>
>>
>> To: qemu-devel@nongnu.org
>>
>>
>>
>>
>> Hi
>>
>> This series are on top of the migration-next-v5 series just posted.
>>
>> First of all, this is an RFC/Work in progress. Just a lot of people
>> asked for it, and I would like review of the design.
>>
>> Hello,
>>
>> Thanks for sharing this early/WIP version for evaluation.
>>
>> Still in the middle of code review..but wanted to share a couple
>> of quick observations.
>> 'tried to use it to migrate a 128G/10VCPU guest (speed set to 10G
>> and downtime 2s).
>> Once with no workload (i.e. idle guest) and the second was with a
>> SpecJBB running in the guest.
>>
>> The idle guest case seemed to migrate fine...
>>
>>
>> capabilities: xbzrle: off
>> Migration status: completed
>> transferred ram: 3811345 kbytes
>> remaining ram: 0 kbytes
>> total ram: 134226368 kbytes
>> total time: 199743 milliseconds
>>
>>
>> In the case of the SpecJBB I ran into issues during stage 3...the
>> source host's qemu and the guest hung. I need to debug this
>> more... (if already have some hints pl. let me know.).
>>
>>
>> capabilities: xbzrle: off
>> Migration status: active
>> transferred ram: 127618578 kbytes
>> remaining ram: 2386832 kbytes
>> total ram: 134226368 kbytes
>> total time: 526139 milliseconds
>> (qemu) qemu_savevm_state_complete called
>> qemu_savevm_state_complete calling ram_save_complete
>>
>> <--- hung somewhere after this ('need to get more info).
>>
>>
>>
>>
>> Appears to be some race condition...as there are cases when it hangs
>> and in some cases it succeeds.
> Weird guess, try to use less vcpus, same ram.
Ok..will try that.
> The way that we stop cpus is _hacky_ to say it somewhere. Will try to think about that part.
Ok.
> Thanks for the testing. All my testing has been done with 8GB guests and 2vcps. Will try with more vcpus to see if it makes a difference.
>
>
>
>
>> (qemu) info migrate
>> capabilities: xbzrle: off
>> Migration status: completed
>> transferred ram: 129937687 kbytes
>> remaining ram: 0 kbytes
>> total ram: 134226368 kbytes
>> total time: 543228 milliseconds
> Humm, _that_ is more strange. This means that it finished.
There are cases where the migration is finishing just fine... even with
larger guest configurations (256G/20VCPUs).
> Could you run qemu under gdb and sent me the stack traces?
>
> I don't know your gdb thread kung-fu, so here are the instructions just in case:
>
> gdb --args <exact qemu commandh line you used> C-c to break when it hangs (gdb)info threads you see all the threads running (gdb)thread 1 or whatever other number (gdb)bt the backtrace of that thread.
The hang is intermittent...
I ran it 4-5 times (under gdb) just now and I didn't see the issue :-(
> I am specially interested in the backtrace of the migration thread and of the iothread.
Will keep re-trying with different configs. and see if i get lucky in
reproducing it (under gdb).
Vinod
>
> Thanks, Juan.
>
>> Need to review/debug...
>>
>> Vinod
>>
>>
>>
>> ---
>>
>> As with the non-migration-thread version the Specjbb workload
>> completed before the migration attempted to move to stage 3 (i.e.
>> didn't converge while the workload was still active).
>>
>> BTW, with this version of the bits (i.e. while running SpecJBB
>> which is supposed to dirty quite a bit of memory) I noticed that
>> there wasn't much change in the b/w usage of the dedicated 10Gb
>> private network link (It was still < ~1.5-3.0Gb/sec). Expected
>> this to be a little better since we have a separate thread... not
>> sure what else is in play here ? (numa locality of where the
>> migration thread runs or something other basic tuning in the
>> implementation ?)
>>
>> 'have a hi-level design question... (perhaps folks have already
>> thought about it..and categorized it as potential future
>> optimization..?)
>>
>> Would it be possible to off load the iothread completely [from all
>> migration related activity] and have one thread (with the
>> appropriate protection) get involved with getting the list of the
>> dirty pages ? Have one or more threads dedicated for trying to
>> push multiple streams of data to saturate the allocated network
>> bandwidth ? This may help in large + busy guests. Comments?
>> There are perhaps other implications of doing all of this (like
>> burning more host cpu cycles) but perhaps this can be configurable
>> based on user's needs... e.g. fewer but large guests on a host
>> with no over subscription.
>>
>> Thanks
>> Vinod
>>
>>
>>
>>
>> It does:
>> - get a new bitmap for migration, and that bitmap uses 1 bit by page
>> - it unfolds migration_buffered_file. Only one user existed.
>> - it simplifies buffered_file a lot.
>>
>> - About the migration thread, special attention was giving to try to
>> get the series review-able (reviewers would tell if I got it).
>>
>> Basic design:
>> - we create a new thread instead of a timer function
>> - we move all the migration work to that thread (but run everything
>> except the waits with the iothread lock.
>> - we move all the writting to outside the iothread lock. i.e.
>> we walk the state with the iothread hold, and copy everything to one buffer.
>> then we write that buffer to the sockets outside the iothread lock.
>> - once here, we move to writting synchronously to the sockets.
>> - this allows us to simplify quite a lot.
>>
>> And basically, that is it. Notice that we still do the iterate page
>> walking with the iothread held. Light testing show that we got
>> similar speed and latencies than without the thread (notice that
>> almost no optimizations done here yet).
>>
>> Appart of the review:
>> - Are there any locking issues that I have missed (I guess so)
>> - stop all cpus correctly. vm_stop should be called from the iothread,
>> I use the trick of using a bottom half to get that working correctly.
>> but this _implementation_ is ugly as hell. Is there an easy way
>> of doing it?
>> - Do I really have to export last_ram_offset(), there is no other way
>> of knowing the ammount of RAM?
>>
>> Known issues:
>>
>> - for some reason, when it has to start a 2nd round of bitmap
>> handling, it decides to dirty all pages. Haven't found still why
>> this happens.
>>
>> If you can test it, and said me where it breaks, it would also help.
>>
>> Work is based on Umesh thread work, and work that Paolo Bonzini had
>> work on top of that. All the mirgation thread was done from scratch
>> becase I was unable to debug why it was failing, but it "owes" a lot
>> to the previous design.
>>
>> Thanks in advance, Juan.
>>
>> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:
>>
>> Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23
>> 13:15:34 -0500)
>>
>> are available in the git repository at:
>>
>>
>> http://repo.or.cz/r/qemu/quintela.git migration-thread-v1
>>
>> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:
>>
>> buffered_file: unfold buffered_append in buffered_put_buffer
>> (2012-07-24 16:46:13 +0200)
>>
>>
>> Juan Quintela (23):
>> buffered_file: g_realloc() can't fail
>> savevm: Factorize ram globals reset in its own function
>> ram: introduce migration_bitmap_set_dirty()
>> ram: Introduce migration_bitmap_test_and_reset_dirty()
>> ram: Export last_ram_offset()
>> ram: introduce migration_bitmap_sync()
>> Separate migration bitmap
>> buffered_file: rename opaque to migration_state
>> buffered_file: opaque is MigrationState
>> buffered_file: unfold migrate_fd_put_buffer
>> buffered_file: unfold migrate_fd_put_ready
>> buffered_file: unfold migrate_fd_put_buffer
>> buffered_file: unfold migrate_fd_put_buffer
>> buffered_file: We can access directly to bandwidth_limit
>> buffered_file: Move from using a timer to use a thread
>> migration: make qemu_fopen_ops_buffered() return void
>> migration: stop all cpus correctly
>> migration: make writes blocking
>> migration: remove unfreeze logic
>> migration: take finer locking
>> buffered_file: Unfold the trick to restart generating migration data
>> buffered_file: don't flush on put buffer
>> buffered_file: unfold buffered_append in buffered_put_buffer
>>
>> Paolo Bonzini (2):
>> split MRU ram list
>> BufferedFile: append, then flush
>>
>> Umesh Deshpande (2):
>> add a version number to ram_list
>> protect the ramlist with a separate mutex
>>
>> arch_init.c | 108 +++++++++++++++++++++++++-------
>> buffered_file.c | 179 +++++++++++++++++-------------------------------------
>> buffered_file.h | 12 +---
>> cpu-all.h | 17 +++++-
>> exec-obsolete.h | 10 ---
>> exec.c | 45 +++++++++++---
>> migration-exec.c | 2 -
>> migration-fd.c | 6 --
>> migration-tcp.c | 2 +-
>> migration-unix.c | 2 -
>> migration.c | 111 ++++++++++++++-------------------
>> migration.h | 6 ++
>> qemu-file.h | 5 --
>> savevm.c | 5 --
>> 14 files changed, 249 insertions(+), 261 deletions(-)
next prev parent reply other threads:[~2012-07-27 14:21 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 01/27] buffered_file: g_realloc() can't fail Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 02/27] split MRU ram list Juan Quintela
2012-07-25 20:20 ` Michael Roth
2012-07-26 13:19 ` Avi Kivity
2012-07-24 18:36 ` [Qemu-devel] [PATCH 03/27] savevm: Factorize ram globals reset in its own function Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 04/27] add a version number to ram_list Juan Quintela
2012-07-25 23:27 ` Michael Roth
2012-07-26 9:19 ` Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 05/27] protect the ramlist with a separate mutex Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 06/27] ram: introduce migration_bitmap_set_dirty() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 07/27] ram: Introduce migration_bitmap_test_and_reset_dirty() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 08/27] ram: Export last_ram_offset() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 09/27] ram: introduce migration_bitmap_sync() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 10/27] Separate migration bitmap Juan Quintela
2012-07-25 9:16 ` Avi Kivity
2012-07-26 9:22 ` Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 11/27] BufferedFile: append, then flush Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 12/27] buffered_file: rename opaque to migration_state Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 13/27] buffered_file: opaque is MigrationState Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 14/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 15/27] buffered_file: unfold migrate_fd_put_ready Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 16/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 17/27] " Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 18/27] buffered_file: We can access directly to bandwidth_limit Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 19/27] buffered_file: Move from using a timer to use a thread Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 20/27] migration: make qemu_fopen_ops_buffered() return void Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 21/27] migration: stop all cpus correctly Juan Quintela
2012-07-26 12:54 ` Eric Blake
2012-07-24 18:36 ` [Qemu-devel] [PATCH 22/27] migration: make writes blocking Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 23/27] migration: remove unfreeze logic Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 24/27] migration: take finer locking Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 25/27] buffered_file: Unfold the trick to restart generating migration data Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 26/27] buffered_file: don't flush on put buffer Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 27/27] buffered_file: unfold buffered_append in buffered_put_buffer Juan Quintela
2012-07-25 9:55 ` [Qemu-devel] [RFC 00/27] Migration thread (WIP) Orit Wasserman
2012-07-26 10:57 ` Jan Kiszka
2012-07-26 11:16 ` Juan Quintela
2012-07-26 11:56 ` Jan Kiszka
[not found] ` <500EF579.5040607@redhat.com>
2012-07-26 18:41 ` [Qemu-devel] Fwd: " Chegu Vinod
2012-07-26 21:26 ` Chegu Vinod
2012-07-27 11:05 ` Juan Quintela
[not found] ` <4168C988EBDF2141B4E0B6475B6A73D1165CDD@G6W2493.americas.hpqcorp.net>
2012-07-27 14:21 ` Chegu Vinod [this message]
2012-07-26 21:36 ` [Qemu-devel] " Michael Roth
2012-08-02 12:01 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5012A3DA.20904@hp.com \
--to=chegu_vinod@hp.com \
--cc=owasserm@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).