From: Chegu Vinod <chegu_vinod@hp.com>
To: Juan Quintela <quintela@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC 0/7] Fix migration with lots of memory
Date: Sun, 10 Jun 2012 20:56:35 -0700 [thread overview]
Message-ID: <4FD56C73.8080708@hp.com> (raw)
In-Reply-To: <cover.1337710679.git.quintela@redhat.com>
Hello,
I did pick up these patches a while back and did run some migration tests while
running simple workloads in the guest. Below are some results.
FYI...
Vinod
----
Config Details:
Guest 10vcps, 60GB (running on a host that is 6cores(12threads) and 64GB).
The hosts are identical X86_64 Blade servers& are connected via a private
10G link (for migration traffic)
Guest was started using qemu (no virsh/virt-manager etc).
Migration was initiated at the qemu monitor prompt
and the migration_set_speed was used to set to 10G. No changes
to the downtime.
Software:
- Guest& the Host OS : 3.4.0-rc7+
- Vanilla : basic upstream qemu.git
- huge_memory changes(Juan's qemu.git tree)
[ Note : BTW, 'did also try vers:11 of XBZRLE patches...but ran into issues (guest crashed
after migration) 'have reported it to the author]
Here are the simple "workloads" and results:
1) Idling guest
2) AIM7-compute (with 2000 users).
3) 10way parallel make (of the kernel)
4) 2 instances of memory r/w loop (exactly the same as in docs/xbzrle.txt)
5) SPECJbb2005
Note: In the Vanilla case I had instrumented ram_save_live()
to print out the total migration time and the MB's transferred.
1) Idling guest:
Vanilla :
Total Mig. time: 173016 ms
Total MB's transferred : 1606MB
huge_memory:
Total Mig. time: 48821 ms
Total MB's transferred : 1620 MB
2) AIM7-compute (2000 users)
Vanilla :
Total Mig. time: 241124 ms
Total MB's transferred : 4827MB
huge_memory:
Total Mig. time: 66716 ms
Total MB's transferred : 4022MB
3) 10 way parallel make: (of the linux kernel)
Vanilla :
Total Mig. time: 104319 ms
Total MB's transferred : 2316MB
huge_memory:
Total Mig. time: 55105 ms
Total MB's transferred : 2995MB
4) 2 instances of Memory r/w loop: (refer to docs/xbzrle.txt)
Vanilla :
Total Mig. time: 112102 ms
Total MB's transferred : 1739MB
huge_memory:
Total Mig. time: 85504ms
Total MB's transferred : 1745MB
5) SPECJbb :
Vanilla :
Total Mig. time: 162189 ms
Total MB's transferred : 5461MB
huge_memory:
Total Mig. time: 67787 ms
Total MB's transferred : 8528MB
[Expected] Observation :
Unlike with the Vanilla case(& also the XBZRLE case), with these patches I was still able
to interact with the qemu monitor prompt and also interact with the guest during the migration (i.e. during the iterative pre-copy phase).
------
On 5/22/2012 11:32 AM, Juan Quintela wrote:
> Hi
>
> After a long, long time, this is v2.
>
> This are basically the changes that we have for RHEL, due to the
> problems that we have with big memory machines. I just rebased the
> patches and fixed the easy parts:
>
> - buffered_file_limit is gone: we just use 50ms and call it a day
>
> - I let ram_addr_t as a valid type for a counter (no, I still don't
> agree with Anthony on this, but it is not important).
>
> - Print total time of migration always. Notice that I also print it
> when migration is completed. Luiz, could you take a look to see if
> I did something worng (probably).
>
> - Moved debug printfs to tracepointns. Thanks a lot to Stefan for
> helping with it. Once here, I had to put the traces in the middle
> of trace-events file, if I put them on the end of the file, when I
> enable them, I got generated the previous two tracepoints, instead
> of the ones I just defined. Stefan is looking on that. Workaround
> is defining them anywhere else.
>
> - exit from cpu_physical_memory_reset_dirty(). Anthony wanted that I
> created an empty stub for kvm, and maintain the code for tcg. The
> problem is that we can have both kvm and tcg running from the same
> binary. Intead of exiting in the middle of the function, I just
> refactored the code out. Is there an struct where I could add a new
> function pointer for this behaviour?
>
> - exit if we have been too long on ram_save_live() loop. Anthony
> didn't like this, I will sent a version based on the migration
> thread in the following days. But just need something working for
> other people to test.
>
> Notice that I still got "lots" of more than 50ms printf's. (Yes,
> there is a debugging printf there).
>
> - Bitmap handling. Still all code to count dirty pages, will try to
> get something saner based on bitmap optimizations.
>
> Comments?
>
> Later, Juan.
>
>
>
>
> v1:
> ---
>
> Executive Summary
> -----------------
>
> This series of patches fix migration with lots of memory. With them stalls
> are removed, and we honored max_dowtime.
> I also add infrastructure to measure what is happening during migration
> (#define DEBUG_MIGRATION and DEBUG_SAVEVM).
>
> Migration is broken at the momment in qemu tree, Michael patch is needed to
> fix virtio migration. Measurements are given for qemu-kvm tree. At the end, some measurements
> with qemu tree.
>
> Long Version with measurements (for those that like numbers O:-)
> ------------------------------
>
> 8 vCPUS and 64GB RAM, a RHEL5 guest that is completelly idle
>
> initial
> -------
>
> savevm: save live iterate section id 3 name ram took 3266 milliseconds 46 times
>
> We have 46 stalls, and missed the 100ms deadline 46 times.
> stalls took around 3.5 and 3.6 seconds each.
>
> savevm: save devices took 1 milliseconds
>
> if you had any doubt (rest of devices, not RAM) took less than 1ms, so
> we don't care for now to optimize them.
>
> migration: ended after 207411 milliseconds
>
> total migration took 207 seconds for this guest
>
> samples % image name symbol name
> 2161431 72.8297 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 379416 12.7845 qemu-system-x86_64 ram_save_live
> 367880 12.3958 qemu-system-x86_64 ram_save_block
> 16647 0.5609 qemu-system-x86_64 qemu_put_byte
> 10416 0.3510 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 9013 0.3037 qemu-system-x86_64 qemu_put_be32
>
> Clearly, we are spending too much time on cpu_physical_memory_reset_dirty.
>
> ping results during the migration.
>
> rtt min/avg/max/mdev = 474.395/39772.087/151843.178/55413.633 ms, pipe 152
>
> You can see that the mean and maximun values are quite big.
>
> We got in the guests the dreade: CPU softlookup for 10s
>
> No need to iterate if we already are over the limit
> ---------------------------------------------------
>
> Numbers similar to previous ones.
>
> KVM don't care about TLB handling
> ---------------------------------
>
> savevm: save livne iterate section id 3 name ram took 466 milliseconds 56 times
>
> 56 stalls, but much smaller, betweenn 0.5 and 1.4 seconds
>
> migration: ended after 115949 milliseconds
>
> total time has improved a lot. 115 seconds.
>
> samples % image name symbol name
> 431530 52.1152 qemu-system-x86_64 ram_save_live
> 355568 42.9414 qemu-system-x86_64 ram_save_block
> 14446 1.7446 qemu-system-x86_64 qemu_put_byte
> 11856 1.4318 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 3281 0.3962 qemu-system-x86_64 qemu_put_be32
> 2426 0.2930 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 2180 0.2633 qemu-system-x86_64 qemu_put_be64
>
> notice how cpu_physical_memory_dirty() use much less time.
>
> rtt min/avg/max/mdev = 474.438/1529.387/15578.055/2595.186 ms, pipe 16
>
> ping values from outside to the guest have improved a bit, but still
> bad.
>
> Exit loop if we have been there too long
> ----------------------------------------
>
> not a single stall bigger than 100ms
>
> migration: ended after 157511 milliseconds
>
> not as good time as previous one, but we have removed the stalls.
>
> samples % image name symbol name
> 1104546 71.8260 qemu-system-x86_64 ram_save_live
> 370472 24.0909 qemu-system-x86_64 ram_save_block
> 30419 1.9781 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 16252 1.0568 qemu-system-x86_64 qemu_put_byte
> 3400 0.2211 qemu-system-x86_64 qemu_put_be32
> 2657 0.1728 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 2206 0.1435 qemu-system-x86_64 qemu_put_be64
> 1559 0.1014 qemu-system-x86_64 qemu_file_rate_limit
>
>
> You can see that ping times are improving
> rtt min/avg/max/mdev = 474.422/504.416/628.508/35.366 ms
>
> now the maximun is near the minimum, in reasonable values.
>
> The limit in the loop in stage loop has been put into 50ms because
> buffered_file run a timer each 100ms. If we miss that timer, we ended
> having trouble. So, I put 100/2.
>
> I tried other values: 15ms (max_downtime/2, so it could be set by the
> user), but gave too much total time (~400seconds).
>
> I tried bigger values, 75ms and 100ms, but with any of them we got
> stalls, some times as big as 1s, as we loss some timer run, and then
> calculations are wrong.
>
> With this patch, the softlookups are gone.
>
> Change calculation to exit live migration
> -----------------------------------------
>
> we spent too much time on ram_save_live(), the problem is the
> calculation of number of dirty pages (ram_save_remaining()). Instead
> of walking the bitmap each time that we need the value, we just
> maintain the number of dirty pages each time that we change one value
> in the bitmap.
>
> migration: ended after 151187 milliseconds
>
> same total time.
>
> samples % image name symbol name
> 365104 84.1659 qemu-system-x86_64 ram_save_block
> 32048 7.3879 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 16033 3.6960 qemu-system-x86_64 qemu_put_byte
> 3383 0.7799 qemu-system-x86_64 qemu_put_be32
> 3028 0.6980 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 2174 0.5012 qemu-system-x86_64 qemu_put_be64
> 1953 0.4502 qemu-system-x86_64 ram_save_live
> 1408 0.3246 qemu-system-x86_64 qemu_file_rate_limit
>
> time is spent in ram_save_block() as expected.
>
> rtt min/avg/max/mdev = 474.412/492.713/539.419/21.896 ms
>
> std deviation is still better than without this.
>
>
> and now, with load on the guest!!!
> ----------------------------------
>
> will show only without my patches applied, and at the end (as with
> load it takes more time to run the tests).
>
> load is synthetic:
>
> stress -c 2 -m 4 --vm-bytes 256M
>
> (2 cpu threads and two memory threads dirtying each 256MB RAM)
>
> Notice that we are dirtying too much memory to be able to migrate with
> the default downtime of 30ms. What the migration should do is loop over
> but without having stalls. To get the migration ending, I just kill the
> stress process after several iterations through all memory.
>
> initial
> -------
>
> same stalls that without load (stalls are caused when it finds lots of
> contiguous zero pages).
>
>
> samples % image name symbol name
> 2328320 52.9645 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 1504561 34.2257 qemu-system-x86_64 ram_save_live
> 382838 8.7088 qemu-system-x86_64 ram_save_block
> 52050 1.1840 qemu-system-x86_64 cpu_get_physical_page_desc
> 48975 1.1141 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
>
> rtt min/avg/max/mdev = 474.428/21033.451/134818.933/38245.396 ms, pipe 135
>
> You can see that values/results are similar to what we had.
>
> with all patches
> ----------------
>
> no stalls, I stopped it after 438 seconds
>
> samples % image name symbol name
> 387722 56.4676 qemu-system-x86_64 ram_save_block
> 109500 15.9475 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 92328 13.4466 qemu-system-x86_64 cpu_get_physical_page_desc
> 43573 6.3459 qemu-system-x86_64 phys_page_find_alloc
> 18255 2.6586 qemu-system-x86_64 qemu_put_byte
> 3940 0.5738 qemu-system-x86_64 qemu_put_be32
> 3621 0.5274 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 2591 0.3774 qemu-system-x86_64 ram_save_live
>
> and ping gives similar values to unload one.
>
> rtt min/avg/max/mdev = 474.400/486.094/548.479/15.820 ms
>
> Note:
>
> - I tested a version of this patches/algorithms with 400GB guests with
> an old qemu-kvm version (0.9.1, the one in RHEL5. with so many
> memory the handling of the dirty bitmap is the thing that end
> causing stalls, will try to retest when I got access to the machines
> again).
>
>
> QEMU tree
> ---------
>
> original qemu
> -------------
>
> savevm: save live iterate section id 2 name ram took 296 milliseconds 47 times
>
> stalls similar to qemu-kvm.
>
> migration: ended after 205938 milliseconds
>
> similar total time.
>
> samples % image name symbol name
> 2158149 72.3752 qemu-system-x86_64 cpu_physical_memory_reset_dirty
> 382016 12.8112 qemu-system-x86_64 ram_save_live
> 367000 12.3076 qemu-system-x86_64 ram_save_block
> 18012 0.6040 qemu-system-x86_64 qemu_put_byte
> 10496 0.3520 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 7366 0.2470 qemu-system-x86_64 qemu_get_ram_ptr
>
> very bad ping times
> rtt min/avg/max/mdev = 474.424/54575.554/159139.429/54473.043 ms, pipe 160
>
>
> with all patches applied (no load)
> ----------------------------------
>
> savevm: save live iterate section id 2 name ram took 109 milliseconds 1 times
>
> only one mini-stall, it is during stage 3 of savevm.
>
> migration: ended after 149529 milliseconds
>
> similar time (a bit faster indeed)
>
> samples % image name symbol name
> 366803 73.9172 qemu-system-x86_64 ram_save_block
> 31717 6.3915 qemu-system-x86_64 kvm_client_sync_dirty_bitmap
> 16489 3.3228 qemu-system-x86_64 qemu_put_byte
> 5512 1.1108 qemu-system-x86_64 main_loop_wait
> 4886 0.9846 qemu-system-x86_64 cpu_exec_all
> 3418 0.6888 qemu-system-x86_64 qemu_put_be32
> 3397 0.6846 qemu-system-x86_64 kvm_vcpu_ioctl
> 3334 0.6719 [vdso] (tgid:18656 range:0x7ffff7ffe000-0x7ffff7fff000) [vdso] (tgid:18656 range:0x7ffff7ffe000-0x7ffff7fff000)
> 2913 0.5870 qemu-system-x86_64 cpu_physical_memory_reset_dirty
>
> std deviation is a bit worse than qemu-kvm, but nothing to write home
> rtt min/avg/max/mdev = 475.406/485.577/909.463/40.292 ms
>
> Juan Quintela (7):
> Add spent time for migration
> Add tracepoints for savevm section start/end
> No need to iterate if we already are over the limit
> Only TCG needs TLB handling
> Only calculate expected_time for stage 2
> Exit loop if we have been there too long
> Maintaing number of dirty pages
>
> arch_init.c | 40 ++++++++++++++++++++++------------------
> cpu-all.h | 1 +
> exec-obsolete.h | 8 ++++++++
> exec.c | 33 +++++++++++++++++++++++----------
> hmp.c | 2 ++
> migration.c | 11 +++++++++++
> migration.h | 1 +
> qapi-schema.json | 12 +++++++++---
> savevm.c | 11 +++++++++++
> trace-events | 6 ++++++
> 10 files changed, 94 insertions(+), 31 deletions(-)
>
prev parent reply other threads:[~2012-06-11 3:56 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
2012-05-22 18:32 ` [Qemu-devel] [PATCH 1/7] Add spent time for migration Juan Quintela
2012-06-14 10:52 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 2/7] Add tracepoints for savevm section start/end Juan Quintela
2012-06-14 11:00 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 3/7] No need to iterate if we already are over the limit Juan Quintela
2012-06-14 11:03 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 4/7] Only TCG needs TLB handling Juan Quintela
2012-06-14 11:15 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 5/7] Only calculate expected_time for stage 2 Juan Quintela
2012-06-14 11:31 ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long Juan Quintela
2012-06-14 11:36 ` Orit Wasserman
2012-06-21 19:34 ` Juan Quintela
2012-06-22 2:42 ` 陳韋任 (Wei-Ren Chen)
2012-06-22 12:44 ` Juan Quintela
2012-05-22 18:32 ` [Qemu-devel] [PATCH 7/7] Maintaing number of dirty pages Juan Quintela
2012-06-14 11:42 ` Orit Wasserman
2012-06-11 3:56 ` Chegu Vinod [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FD56C73.8080708@hp.com \
--to=chegu_vinod@hp.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.