From: Chegu Vinod <chegu_vinod@hp.com>
To: Juan Quintela <quintela@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC 0/7] Fix migration with lots of memory
Date: Sun, 10 Jun 2012 20:56:35 -0700 [thread overview]
Message-ID: <4FD56C73.8080708@hp.com> (raw)
In-Reply-To: <cover.1337710679.git.quintela@redhat.com>
Hello,
I did pick up these patches a while back and did run some migration tests while
running simple workloads in the guest. Below are some results.
FYI...
Vinod
----
Config Details:
Guest 10vcps, 60GB (running on a host that is 6cores(12threads) and 64GB).
The hosts are identical X86_64 Blade servers& are connected via a private
10G link (for migration traffic)
Guest was started using qemu (no virsh/virt-manager etc).
Migration was initiated at the qemu monitor prompt
and the migration_set_speed was used to set to 10G. No changes
to the downtime.
Software:
- Guest& the Host OS : 3.4.0-rc7+
- Vanilla : basic upstream qemu.git
- huge_memory changes(Juan's qemu.git tree)
[ Note : BTW, 'did also try vers:11 of XBZRLE patches...but ran into issues (guest crashed
after migration) 'have reported it to the author]
Here are the simple "workloads" and results:
1) Idling guest
2) AIM7-compute (with 2000 users).
3) 10way parallel make (of the kernel)
4) 2 instances of memory r/w loop (exactly the same as in docs/xbzrle.txt)
5) SPECJbb2005
Note: In the Vanilla case I had instrumented ram_save_live()
to print out the total migration time and the MB's transferred.
1) Idling guest:
Vanilla :
Total Mig. time: 173016 ms
Total MB's transferred : 1606MB
huge_memory:
Total Mig. time: 48821 ms
Total MB's transferred : 1620 MB
2) AIM7-compute (2000 users)
Vanilla :
Total Mig. time: 241124 ms
Total MB's transferred : 4827MB
huge_memory:
Total Mig. time: 66716 ms
Total MB's transferred : 4022MB
3) 10 way parallel make: (of the linux kernel)
Vanilla :
Total Mig. time: 104319 ms
Total MB's transferred : 2316MB
huge_memory:
Total Mig. time: 55105 ms
Total MB's transferred : 2995MB
4) 2 instances of Memory r/w loop: (refer to docs/xbzrle.txt)
Vanilla :
Total Mig. time: 112102 ms
Total MB's transferred : 1739MB
huge_memory:
Total Mig. time: 85504ms
Total MB's transferred : 1745MB
5) SPECJbb :
Vanilla :
Total Mig. time: 162189 ms
Total MB's transferred : 5461MB
huge_memory:
Total Mig. time: 67787 ms
Total MB's transferred : 8528MB
[Expected] Observation :
Unlike with the Vanilla case(& also the XBZRLE case), with these patches I was still able
to interact with the qemu monitor prompt and also interact with the guest during the migration (i.e. during the iterative pre-copy phase).
------
On 5/22/2012 11:32 AM, Juan Quintela wrote:
> Hi
>
> After a long, long time, this is v2.
>
> This are basically the changes that we have for RHEL, due to the
> problems that we have with big memory machines. I just rebased the
> patches and fixed the easy parts:
>
> - buffered_file_limit is gone: we just use 50ms and call it a day
>
> - I let ram_addr_t as a valid type for a counter (no, I still don't
> agree with Anthony on this, but it is not important).
>
> - Print total time of migration always. Notice that I also print it
> when migration is completed. Luiz, could you take a look to see if
> I did something worng (probably).
>
> - Moved debug printfs to tracepointns. Thanks a lot to Stefan for
> helping with it. Once here, I had to put the traces in the middle
> of trace-events file, if I put them on the end of the file, when I
> enable them, I got generated the previous two tracepoints, instead
> of the ones I just defined. Stefan is looking on that. Workaround
> is defining them anywhere else.
>
> - exit from cpu_physical_memory_reset_dirty(). Anthony wanted that I
> created an empty stub for kvm, and maintain the code for tcg. The
> problem is that we can have both kvm and tcg running from the same
> binary. Intead of exiting in the middle of the function, I just
> refactored the code out. Is there an struct where I could add a new
> function pointer for this behaviour?
>
> - exit if we have been too long on ram_save_live() loop. Anthony
> didn't like this, I will sent a version based on the migration
> thread in the following days. But just need something working for
> other people to test.
>
> Notice that I still got "lots" of more than 50ms printf's. (Yes,
> there is a debugging printf there).
>
> - Bitmap handling. Still all code to count dirty pages, will try to
> get something saner based on bitmap optimizations.
>
> Comments?
>
> Later, Juan.
>
>
>
>
> v1:
> ---
>
> Executive Summary
> -----------------
>
> This series of patches fix migration with lots of memory. With them stalls
> are removed, and we honored max_dowtime.
> I also add infrastructure to measure what is happening during migration
> (#define DEBUG_MIGRATION and DEBUG_SAVEVM).
>
> Migration is broken at the momment in qemu tree, Michael patch is needed to
> fix virtio migration. Measurements are given for qemu-kvm tree. At the end, some measurements
> with qemu tree.
>
> Long Version with measurements (for those that like numbers O:-)
> ------------------------------
>
> 8 vCPUS and 64GB RAM, a RHEL5 guest that is completelly idle
>
> initial
> -------
>
> savevm: save live iterate section id 3 name ram took 3266 milliseconds 46 times
>
> We have 46 stalls, and missed the 100ms deadline 46 times.
> stalls took around 3.5 and 3.6 seconds each.
>
> savevm: save devices took 1 milliseconds
>
> if you had any doubt (rest of devices, not RAM) took less than 1ms, so
> we don't care for now to optimize them.
>
> migration: ended after 207411 milliseconds
>
> total migration took 207 seconds for this guest
>
> samples % image name symbol name
> 2161431 72.8297 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 379416 12.7845 qemu-system-x86_64 ram_save_live
> 367880 12.3958 qemu-system-x86_64 ram_save_block
> 16647 0.5609 qemu-system-x86_64 qemu_put_byte
> 10416 0.3510 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 9013 0.3037 qemu-system-x86_64 qemu_put_be32
>
> Clearly, we are spending too much time on cpu_physical_memory_reset_dirty.
>
> ping results during the migration.
>
> rtt min/avg/max/mdev = 474.395/39772.087/151843.178/55413.633 ms, pipe 152
>
> You can see that the mean and maximun values are quite big.
>
> We got in the guests the dreade: CPU softlookup for 10s
>
> No need to iterate if we already are over the limit
> ---------------------------------------------------
>
> Numbers similar to previous ones.
>
> KVM don't care about TLB handling
> ---------------------------------
>
> savevm: save livne iterate section id 3 name ram took 466 milliseconds 56 times
>
> 56 stalls, but much smaller, betweenn 0.5 and 1.4 seconds
>
> migration: ended after 115949 milliseconds
>
> total time has improved a lot. 115 seconds.
>
> samples % image name symbol name
> 431530 52.1152 qemu-system-x86_64 ram_save_live
> 355568 42.9414 qemu-system-x86_64 ram_save_block
> 14446 1.7446 qemu-system-x86_64 qemu_put_byte
> 11856 1.4318 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 3281 0.3962 qemu-system-x86_64 qemu_put_be32
> 2426 0.2930 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 2180 0.2633 qemu-system-x86_64 qemu_put_be64
>
> notice how cpu_physical_memory_dirty() use much less time.
>
> rtt min/avg/max/mdev = 474.438/1529.387/15578.055/2595.186 ms, pipe 16
>
> ping values from outside to the guest have improved a bit, but still
> bad.
>
> Exit loop if we have been there too long
> ----------------------------------------
>
> not a single stall bigger than 100ms
>
> migration: ended after 157511 milliseconds
>
> not as good time as previous one, but we have removed the stalls.
>
> samples % image name symbol name
> 1104546 71.8260 qemu-system-x86_64 ram_save_live
> 370472 24.0909 qemu-system-x86_64 ram_save_block
> 30419 1.9781 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 16252 1.0568 qemu-system-x86_64 qemu_put_byte
> 3400 0.2211 qemu-system-x86_64 qemu_put_be32
> 2657 0.1728 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 2206 0.1435 qemu-system-x86_64 qemu_put_be64
> 1559 0.1014 qemu-system-x86_64 qemu_file_rate_limit
>
>
> You can see that ping times are improving
> rtt min/avg/max/mdev = 474.422/504.416/628.508/35.366 ms
>
> now the maximun is near the minimum, in reasonable values.
>
> The limit in the loop in stage loop has been put into 50ms because
> buffered_file run a timer each 100ms. If we miss that timer, we ended
> having trouble. So, I put 100/2.
>
> I tried other values: 15ms (max_downtime/2, so it could be set by the
> user), but gave too much total time (~400seconds).
>
> I tried bigger values, 75ms and 100ms, but with any of them we got
> stalls, some times as big as 1s, as we loss some timer run, and then
> calculations are wrong.
>
> With this patch, the softlookups are gone.
>
> Change calculation to exit live migration
> -----------------------------------------
>
> we spent too much time on ram_save_live(), the problem is the
> calculation of number of dirty pages (ram_save_remaining()). Instead
> of walking the bitmap each time that we need the value, we just
> maintain the number of dirty pages each time that we change one value
> in the bitmap.
>
> migration: ended after 151187 milliseconds
>
> same total time.
>
> samples % image name symbol name
> 365104 84.1659 qemu-system-x86_64 ram_save_block
> 32048 7.3879 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 16033 3.6960 qemu-system-x86_64 qemu_put_byte
> 3383 0.7799 qemu-system-x86_64 qemu_put_be32
> 3028 0.6980 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 2174 0.5012 qemu-system-x86_64 qemu_put_be64
> 1953 0.4502 qemu-system-x86_64 ram_save_live
> 1408 0.3246 qemu-system-x86_64 qemu_file_rate_limit
>
> time is spent in ram_save_block() as expected.
>
> rtt min/avg/max/mdev = 474.412/492.713/539.419/21.896 ms
>
> std deviation is still better than without this.
>
>
> and now, with load on the guest!!!
> ----------------------------------
>
> will show only without my patches applied, and at the end (as with
> load it takes more time to run the tests).
>
> load is synthetic:
>
> stress -c 2 -m 4 --vm-bytes 256M
>
> (2 cpu threads and two memory threads dirtying each 256MB RAM)
>
> Notice that we are dirtying too much memory to be able to migrate with
> the default downtime of 30ms. What the migration should do is loop over
> but without having stalls. To get the migration ending, I just kill the
> stress process after several iterations through all memory.
>
> initial
> -------
>
> same stalls that without load (stalls are caused when it finds lots of
> contiguous zero pages).
>
>
> samples % image name symbol name
> 2328320 52.9645 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 1504561 34.2257 qemu-system-x86_64 ram_save_live
> 382838 8.7088 qemu-system-x86_64 ram_save_block
> 52050 1.1840 qemu-system-x86_64 cpu_get_physical_page_desc
> 48975 1.1141 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
>
> rtt min/avg/max/mdev = 474.428/21033.451/134818.933/38245.396 ms, pipe 135
>
> You can see that values/results are similar to what we had.
>
> with all patches
> ----------------
>
> no stalls, I stopped it after 438 seconds
>
> samples % image name symbol name
> 387722 56.4676 qemu-system-x86_64 ram_save_block
> 109500 15.9475 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 92328 13.4466 qemu-system-x86_64 cpu_get_physical_page_desc
> 43573 6.3459 qemu-system-x86_64 phys_page_find_alloc
> 18255 2.6586 qemu-system-x86_64 qemu_put_byte
> 3940 0.5738 qemu-system-x86_64 qemu_put_be32
> 3621 0.5274 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 2591 0.3774 qemu-system-x86_64 ram_save_live
>
> and ping gives similar values to unload one.
>
> rtt min/avg/max/mdev = 474.400/486.094/548.479/15.820 ms
>
> Note:
>
> - I tested a version of this patches/algorithms with 400GB guests with
> an old qemu-kvm version (0.9.1, the one in RHEL5. with so many
> memory the handling of the dirty bitmap is the thing that end
> causing stalls, will try to retest when I got access to the machines
> again).
>
>
> QEMU tree
> ---------
>
> original qemu
> -------------
>
> savevm: save live iterate section id 2 name ram took 296 milliseconds 47 times
>
> stalls similar to qemu-kvm.
>
> migration: ended after 205938 milliseconds
>
> similar total time.
>
> samples % image name symbol name
> 2158149 72.3752 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 382016 12.8112 qemu-system-x86_64 ram_save_live
> 367000 12.3076 qemu-system-x86_64 ram_save_block
> 18012 0.6040 qemu-system-x86_64 qemu_put_byte
> 10496 0.3520 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 7366 0.2470 qemu-system-x86_64 qemu_get_ram_ptr
>
> very bad ping times
> rtt min/avg/max/mdev = 474.424/54575.554/159139.429/54473.043 ms, pipe 160
>
>
> with all patches applied (no load)
> ----------------------------------
>
> savevm: save live iterate section id 2 name ram took 109 milliseconds 1 times
>
> only one mini-stall, it is during stage 3 of savevm.
>
> migration: ended after 149529 milliseconds
>
> similar time (a bit faster indeed)
>
> samples % image name symbol name
> 366803 73.9172 qemu-system-x86_64 ram_save_block
> 31717 6.3915 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 16489 3.3228 qemu-system-x86_64 qemu_put_byte
> 5512 1.1108 qemu-system-x86_64 main_loop_wait
> 4886 0.9846 qemu-system-x86_64 cpu_exec_all
> 3418 0.6888 qemu-system-x86_64 qemu_put_be32
> 3397 0.6846 qemu-system-x86_64 kvm_vcpu_ioctl
> 3334 0.6719 [vdso] (tgid:18656 range:0x7ffff7ffe000-0x7ffff7fff000) [vdso] (tgid:18656 range:0x7ffff7ffe000-0x7ffff7fff000)
> 2913 0.5870 qemu-system-x86_64 cpu_physical_memory_reset_dirty
>
> std deviation is a bit worse than qemu-kvm, but nothing to write home
> rtt min/avg/max/mdev = 475.406/485.577/909.463/40.292 ms
>
> Juan Quintela (7):
> Add spent time for migration
> Add tracepoints for savevm section start/end
> No need to iterate if we already are over the limit
> Only TCG needs TLB handling
> Only calculate expected_time for stage 2
> Exit loop if we have been there too long
> Maintaing number of dirty pages
>
> arch_init.c | 40 ++++++++++++++++++++++------------------
> cpu-all.h | 1 +
> exec-obsolete.h | 8 ++++++++
> exec.c | 33 +++++++++++++++++++++++----------
> hmp.c | 2 ++
> migration.c | 11 +++++++++++
> migration.h | 1 +
> qapi-schema.json | 12 +++++++++---
> savevm.c | 11 +++++++++++
> trace-events | 6 ++++++
> 10 files changed, 94 insertions(+), 31 deletions(-)
>
prev parent reply other threads:[~2012-06-11 3:56 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
2012-05-22 18:32 ` [Qemu-devel] [PATCH 1/7] Add spent time for migration Juan Quintela
2012-06-14 10:52 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 2/7] Add tracepoints for savevm section start/end Juan Quintela
2012-06-14 11:00 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 3/7] No need to iterate if we already are over the limit Juan Quintela
2012-06-14 11:03 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 4/7] Only TCG needs TLB handling Juan Quintela
2012-06-14 11:15 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 5/7] Only calculate expected_time for stage 2 Juan Quintela
2012-06-14 11:31 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long Juan Quintela
2012-06-14 11:36 ` Orit Wasserman
2012-06-21 19:34 ` Juan Quintela
2012-06-22 2:42 ` 陳韋任 (Wei-Ren Chen)
2012-06-22 12:44 ` Juan Quintela
2012-05-22 18:32 ` [Qemu-devel] [PATCH 7/7] Maintaing number of dirty pages Juan Quintela
2012-06-14 11:42 ` Orit Wasserman
2012-06-11 3:56 ` Chegu Vinod [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FD56C73.8080708@hp.com \
--to=chegu_vinod@hp.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).