From: Chegu Vinod <chegu_vinod@hp.com>
To: qemu-devel@nongnu.org, Juan Jose Quintela Carreira <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition
Date: Wed, 24 Oct 2012 07:29:26 -0700 [thread overview]
Message-ID: <5087FB46.3050109@hp.com> (raw)
In-Reply-To: <5087F1F5.5060606@hp.com>
[-- Attachment #1: Type: text/plain, Size: 8654 bytes --]
On 10/24/2012 6:49 AM, Chegu Vinod wrote:
> On 10/24/2012 6:40 AM, Vinod, Chegu wrote:
>>
>> Hi
>>
>> This series apply on top of the refactoring that I sent yesterday.
>>
>> Changes from the last version include:
>>
>> - buffered_file.c is gone, its functionality is merged in migration.c
>>
>> special attention to the megre of buffered_file_thread() &
>>
>> migration_file_put_notify().
>>
>> - Some more bitmap handling optimizations (thanks to Orit & Paolo for
>>
>> suggestions and code and Vinod for testing)
>>
>> Please review. Included is the pointer to the full tree.
>>
>> Thanks, Juan.
>>
>> The following changes since commit
>> b6348f29d033d5a8a26f633d2ee94362595f32a4:
>>
>> target-arm/translate: Fix RRX operands (2012-10-17 19:56:46 +0200)
>>
>> are available in the git repository at:
>>
>> http://repo.or.cz/r/qemu/quintela.git migration-thread-20121017
>>
>> for you to fetch changes up to 486dabc29f56d8f0e692395d4a6cd483b3a77f01:
>>
>> ram: optimize migration bitmap walking (2012-10-18 09:20:34 +0200)
>>
>> v3:
>>
>> This is work in progress on top of the previous migration series just
>> sent.
>>
>> - Introduces a thread for migration instead of using a timer and callback
>>
>> - remove the writting to the fd from the iothread lock
>>
>> - make the writes synchronous
>>
>> - Introduce a new pending method that returns how many bytes are
>> pending for
>>
>> one save live section
>>
>> - last patch just shows printfs to see where the time is being spent
>>
>> on the migration complete phase.
>>
>> (yes it pollutes all uses of stop on the monitor)
>>
>> So far I have found that we spent a lot of time on bdrv_flush_all() It
>>
>> can take from 1ms to 600ms (yes, it is not a typo). That dwarfs the
>>
>> migration default downtime time (30ms).
>>
>> Stop all vcpus:
>>
>> - it works now (after the changes on qemu_cpu_is_vcpu on the previous
>>
>> series) caveat is that the time that brdv_flush_all() takes is
>>
>> "unpredictable". Any silver bullets?
>>
>> Paolo suggested to call for migration completion phase:
>>
>> bdrv_aio_flush_all();
>>
>> Sent the dirty pages;
>>
>> bdrv_drain_all()
>>
>> brdv_flush_all()
>>
>> another round through the bitmap in case that completions have
>>
>> changed some page
>>
>> Paolo, did I get it right?
>>
>> Any other suggestion?
>>
>> - migrate_cancel() is not properly implemented (as in the film that we
>>
>> take no locks, ...)
>>
>> - expected_downtime is not calculated.
>>
>> I am about to merge migrate_fd_put_ready & buffered_thread() and
>>
>> that would make trivial to calculate.
>>
>> It outputs something like:
>>
>> wakeup_request 0
>>
>> time cpu_disable_ticks 0
>>
>> time pause_all_vcpus 1
>>
>> time runstate_set 1
>>
>> time vmstate_notify 2
>>
>> time bdrv_drain_all 2
>>
>> time flush device
>>
>> /dev/disk/by-path/ip-192.168.10.200:3260-iscsi-iqn.2010-12.org.trasno:iscsi.lvm-lun-1:
>>
>> 3
>>
>> time flush device : 3
>>
>> time flush device : 3
>>
>> time flush device : 3
>>
>> time bdrv_flush_all 5
>>
>> time monitor_protocol_event 5
>>
>> vm_stop 2 5
>>
>> synchronize_all_states 1
>>
>> migrate RAM 37
>>
>> migrate rest devices 1
>>
>> complete without error 3a 44
>>
>> completed 45
>>
>> end completed stage 45
>>
>> As you can see, we estimate that we can sent all pending data in 30ms,
>>
>> it took 37ms to send the RAM (that is what we calculate). So
>>
>> estimation is quite good.
>>
>> What it gives me lots of variation is on the line with device name of
>> "time
>>
>> flush device".
>>
>> That is what varies between 1ms to 600ms
>>
>> This is in a completely idle guest. I am running:
>>
>> while (1) {
>>
>> uint64_t delay;
>>
>> if (gettimeofday(&t0, NULL) != 0)
>>
>> perror("gettimeofday 1");
>>
>> if (usleep(ms2us(10)) != 0)
>>
>> perror("usleep");
>>
>> if (gettimeofday(&t1, NULL) != 0)
>>
>> perror("gettimeofday 2");
>>
>> t1.tv_usec -= t0.tv_usec;
>>
>> if (t1.tv_usec < 0) {
>>
>> t1.tv_usec += 1000000;
>>
>> t1.tv_sec--;
>>
>> }
>>
>> t1.tv_sec -= t0.tv_sec;
>>
>> delay = t1.tv_sec * 1000 + t1.tv_usec/1000;
>>
>> if (delay > 100)
>>
>> printf("delay of %ld ms\n", delay);
>>
>> }
>>
>> To see the latency inside the guest (i.e. ask for a 10ms sleep, and
>> see how
>>
>> long it takes).
>>
>> [root@d1 ~]# ./timer
>>
>> delay of 161 ms
>>
>> delay of 135 ms
>>
>> delay of 143 ms
>>
>> delay of 132 ms
>>
>> delay of 131 ms
>>
>> delay of 141 ms
>>
>> delay of 113 ms
>>
>> delay of 119 ms
>>
>> delay of 114 ms
>>
>> But that values are independent of migration. Without even starting
>>
>> the migration, idle guest doing nothing, we get it sometimes.
>>
>> Juan Quintela (27):
>>
>> buffered_file: Move from using a timer to use a thread
>>
>> migration: make qemu_fopen_ops_buffered() return void
>>
>> migration: stop all cpus correctly
>>
>> migration: make writes blocking
>>
>> migration: remove unfreeze logic
>>
>> migration: take finer locking
>>
>> buffered_file: Unfold the trick to restart generating migration data
>>
>> buffered_file: don't flush on put buffer
>>
>> buffered_file: unfold buffered_append in buffered_put_buffer
>>
>> savevm: New save live migration method: pending
>>
>> migration: include qemu-file.h
>>
>> migration-fd: remove duplicate include
>>
>> migration: move buffered_file.c code into migration.c
>>
>> migration: move migration_fd_put_ready()
>>
>> migration: Inline qemu_fopen_ops_buffered into migrate_fd_connect
>>
>> migration: move migration notifier
>>
>> migration: move begining stage to the migration thread
>>
>> migration: move exit condition to migration thread
>>
>> migration: unfold rest of migrate_fd_put_ready() into thread
>>
>> migration: print times for end phase
>>
>> ram: rename last_block to last_seen_block
>>
>> ram: Add last_sent_block
>>
>> memory: introduce memory_region_test_and_clear_dirty
>>
>> ram: Use memory_region_test_and_clear_dirty
>>
>> fix memory.c
>>
>> migration: Only go to the iterate stage if there is anything to send
>>
>> ram: optimize migration bitmap walking
>>
>> Paolo Bonzini (1):
>>
>> split MRU ram list
>>
>> Umesh Deshpande (2):
>>
>> add a version number to ram_list
>>
>> protect the ramlist with a separate mutex
>>
>> Makefile.objs | 2 +-
>>
>> arch_init.c | 133 +++++++++++--------
>>
>> block-migration.c | 49 ++-----
>>
>> block.c | 6 +
>>
>> buffered_file.c | 256 -----------------------------------
>>
>> buffered_file.h | 22 ---
>>
>> cpu-all.h | 13 +-
>>
>> cpus.c | 17 +++
>>
>> exec.c | 44 +++++-
>>
>> memory.c | 17 +++
>>
>> memory.h | 18 +++
>>
>> migration-exec.c | 4 +-
>>
>> migration-fd.c | 9 +-
>>
>> migration-tcp.c | 21 +--
>>
>> migration-unix.c | 4 +-
>>
>> migration.c | 391
>> ++++++++++++++++++++++++++++++++++++++++--------------
>>
>> migration.h | 4 +-
>>
>> qemu-file.h | 5 -
>>
>> savevm.c | 37 +++++-
>>
>> sysemu.h | 1 +
>>
>> vmstate.h | 1 +
>>
>> 21 files changed, 522 insertions(+), 532 deletions(-)
>>
>> delete mode 100644 buffered_file.c
>>
>> delete mode 100644 buffered_file.h
>>
>> --
>>
>> 1.7.11.7
>>
>
>
> Tested-by: Chegu Vinod <chegu_vinod@hp.com>
>
>
> Using these patches 'have verified live migration (on x86_64
> platforms) for guest sizes varying from 64G/10vcpus thru 768G/80vcpus
> and I have seen reduction in both the downtime as well as the total
> migration time. The dirty bitmap optimizations have shown
> improvements too and have helped in the reduction of the downtime
> (perhaps more can be done as a next step..i.e. after the above changes
> (-minus the printf's) make it into upstream). The new migration stats
> that were added were useful too !
>
> Thanks
> Vinod
>
Wanted to follow up on and issue that I had observed... <Already shared
this with Juan/Orit/Paolo but forgot to mention it in the email above!>
As mentioned above for larger (>= 256G ) sized guests the cost of dirty
bitmap synch up is high. During the very start of the migration i.e. in
ram_save_setup() ...noticed that a lot of time was being spent in
synching up the dirty bitmaps etc. (and also perhaps marking the pages
as dirty etc)....this leads to a multiple second freeze on the guest. As
part of optimizing the dirty bitmap synch up this issue needs to be
addressed.
Thanks
Vinod
[-- Attachment #2: Type: text/html, Size: 53108 bytes --]
next prev parent reply other threads:[~2012-10-24 14:29 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4168C988EBDF2141B4E0B6475B6A73D101904FFB@G6W2493.americas.hpqcorp.net>
2012-10-24 13:49 ` [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition Chegu Vinod
2012-10-24 14:29 ` Chegu Vinod [this message]
2012-10-18 7:29 Juan Quintela
2012-10-18 9:00 ` Paolo Bonzini
2012-10-26 13:04 ` Paolo Bonzini
-- strict thread matches above, loose matches on Subject: below --
2012-10-18 7:26 Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5087FB46.3050109@hp.com \
--to=chegu_vinod@hp.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).