* [Qemu-devel] setting migrate_downtime results in halted vm @ 2012-12-27 21:54 Stefan Priebe 2012-12-28 7:05 ` [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) Alexandre DERUMIER 0 siblings, 1 reply; 10+ messages in thread From: Stefan Priebe @ 2012-12-27 21:54 UTC (permalink / raw) To: qemu-devel Hello list, i'm using qemu 1.3 and migration works fine if i do not set migrate_downtime. If i set migrate_downtime to 1s or 0.5s or 0.3s the VM halts immediatly i cannot even connect to the qmp socket anymore and migration takes 5-10 minutes or never finishes. I see high cpu usage on source vm while it hangs. VM was for all tests IDLE and was using 1GB of 4GB mem. Do you have any ideas what happens here? Greets, Stefan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) 2012-12-27 21:54 [Qemu-devel] setting migrate_downtime results in halted vm Stefan Priebe @ 2012-12-28 7:05 ` Alexandre DERUMIER 2012-12-28 17:53 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Alexandre DERUMIER @ 2012-12-28 7:05 UTC (permalink / raw) To: Stefan Priebe, Juan Quintela; +Cc: Paolo Bonzini, qemu-devel Hi list, After discuss with Stefan Yesterday here some more info: (this is for stable qemu 1.3, it was working fine with qemu 1.2) The problem seem that whesettings a migrate_set_downtime to 1sec, the transfert of the vm seem to send all the memory of the vm in 1 step, and not by increment. So the downtime is really huge , 90000ms for 1GB memory. And I think the monitor doesn't response during this last step transfert with migrate_set_downtime 30ms, the memory is correctly sent by increment, then the last step result around 500ms of downtime. I can't reproduce it myself, only difference seem to be 10GBE Network of stefan. ( migrate_set_speed is set to 8589934592) Stefan can reproduce it 100%. I think something must be wrong in arch_init.c, But I'm not expert with qemu migration code. So If you have any ideas/clues.... Regards, Alexandre migrate_set_dowtime:1 ---------------------- Dec 27 12:57:11 starting migration of VM 105 to node 'cloud1-1202' (10.255.0.20) Dec 27 12:57:11 copying disk images Dec 27 12:57:11 starting VM 105 on remote node 'cloud1-1202' Dec 27 12:57:15 starting migration tunnel Dec 27 12:57:15 starting online/live migration on port 60000 Dec 27 12:57:15 migrate_set_speed: 8589934592 Dec 27 12:57:15 migrate_set_downtime: 1 -> query-migrate: monitor hang -> query-migrate: monitor hang -> query-migrate: monitor hang -> query-migrate: (finished)Dec 27 12:58:45 migration speed: 22.76 MB/s - downtime 90004 ms Dec 27 12:58:45 migration status: completed Dec 27 12:58:49 migration finished successfuly (duration 00:01:38) TASK OK migrate_set_downtime:03 The same again with 1GB Memory and migrate_downtime set to 0.03 (cached mem): Dec 27 13:00:19 starting migration of VM 105 to node 'cloud1-1203' (10.255.0.22) Dec 27 13:00:19 copying disk images Dec 27 13:00:19 starting VM 105 on remote node 'cloud1-1203' Dec 27 13:00:22 starting migration tunnel Dec 27 13:00:23 starting online/live migration on port 60000 Dec 27 13:00:23 migrate_set_speed: 8589934592 Dec 27 13:00:23 migrate_set_downtime: 0.03 -> query-migrate : Dec 27 13:00:25 migration status: active (transferred 404647386, remaining 680390656), total 2156265472) , expected downtime 190 -> query-migrate : Dec 27 13:00:27 migration status: active (transferred 880582320, remaining 203579392), total 2156265472) , expected downtime 53 -> query-migrate : (status finished);Dec 27 13:00:29 migration speed: 341.33 MB/s - downtime 490 ms Dec 27 13:00:29 migration status: completed Dec 27 13:00:32 migration finished successfuly (duration 00:00:13) TASK OK ----- Mail original ----- De: "Stefan Priebe" <s.priebe@profihost.ag> À: "qemu-devel" <qemu-devel@nongnu.org> Envoyé: Jeudi 27 Décembre 2012 22:54:55 Objet: [Qemu-devel] setting migrate_downtime results in halted vm Hello list, i'm using qemu 1.3 and migration works fine if i do not set migrate_downtime. If i set migrate_downtime to 1s or 0.5s or 0.3s the VM halts immediatly i cannot even connect to the qmp socket anymore and migration takes 5-10 minutes or never finishes. I see high cpu usage on source vm while it hangs. VM was for all tests IDLE and was using 1GB of 4GB mem. Do you have any ideas what happens here? Greets, Stefan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) 2012-12-28 7:05 ` [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) Alexandre DERUMIER @ 2012-12-28 17:53 ` Paolo Bonzini 2012-12-28 19:03 ` Stefan Priebe 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2012-12-28 17:53 UTC (permalink / raw) To: Alexandre DERUMIER; +Cc: Juan Quintela, qemu-devel, Stefan Priebe Il 28/12/2012 08:05, Alexandre DERUMIER ha scritto: > Hi list, > After discuss with Stefan Yesterday here some more info: > > (this is for stable qemu 1.3, it was working fine with qemu 1.2) > > The problem seem that whesettings a migrate_set_downtime to 1sec, > > the transfert of the vm seem to send all the memory of the vm in 1 step, and not by increment. > So the downtime is really huge , 90000ms for 1GB memory. Can you try commit bde54c08b4854aceee3dee25121a2b835cb81166? Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) 2012-12-28 17:53 ` Paolo Bonzini @ 2012-12-28 19:03 ` Stefan Priebe 2012-12-29 14:00 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Stefan Priebe @ 2012-12-28 19:03 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela Hi Paolo, Am 28.12.2012 18:53, schrieb Paolo Bonzini: > Il 28/12/2012 08:05, Alexandre DERUMIER ha scritto: >> Hi list, >> After discuss with Stefan Yesterday here some more info: >> >> (this is for stable qemu 1.3, it was working fine with qemu 1.2) >> >> The problem seem that whesettings a migrate_set_downtime to 1sec, >> >> the transfert of the vm seem to send all the memory of the vm in 1 step, and not by increment. >> So the downtime is really huge , 90000ms for 1GB memory. > > Can you try commit bde54c08b4854aceee3dee25121a2b835cb81166? i cherry picked that one on top of 1.3 sadly it does not help. VM halts, monitor socket is no longer available kvm process is running with 100% CPU on source side. Greets, Stefan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) 2012-12-28 19:03 ` Stefan Priebe @ 2012-12-29 14:00 ` Paolo Bonzini 2012-12-29 14:05 ` Stefan Priebe 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2012-12-29 14:00 UTC (permalink / raw) To: Stefan Priebe; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela Il 28/12/2012 20:03, Stefan Priebe ha scritto: > Hi Paolo, > > Am 28.12.2012 18:53, schrieb Paolo Bonzini: >> Il 28/12/2012 08:05, Alexandre DERUMIER ha scritto: >>> Hi list, >>> After discuss with Stefan Yesterday here some more info: >>> >>> (this is for stable qemu 1.3, it was working fine with qemu 1.2) >>> >>> The problem seem that whesettings a migrate_set_downtime to 1sec, >>> >>> the transfert of the vm seem to send all the memory of the vm in 1 >>> step, and not by increment. >>> So the downtime is really huge , 90000ms for 1GB memory. >> >> Can you try commit bde54c08b4854aceee3dee25121a2b835cb81166? > > i cherry picked that one on top of 1.3 sadly it does not help. VM halts, > monitor socket is no longer available kvm process is running with 100% > CPU on source side. Can you please test master and, if it works, bisect it in reverse? (That is, mark "bad" if it works, "good" if it fails). Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) 2012-12-29 14:00 ` Paolo Bonzini @ 2012-12-29 14:05 ` Stefan Priebe 2012-12-29 14:58 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Stefan Priebe @ 2012-12-29 14:05 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela Hi Paolo, Am 29.12.2012 15:00, schrieb Paolo Bonzini: >> i cherry picked that one on top of 1.3 sadly it does not help. VM halts, >> monitor socket is no longer available kvm process is running with 100% >> CPU on source side. > > Can you please test master and, if it works, bisect it in reverse? > (That is, mark "bad" if it works, "good" if it fails). It's working fine with qemu master. I walked then through the bug patchset here: http://www.mail-archive.com/qemu-commits@nongnu.org/msg02028.html and backported them step by step to qemu 1.3. It starts working to me after the first 22 patches (after introducing the new mutex and threading for writes). Greets, Stefan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) 2012-12-29 14:05 ` Stefan Priebe @ 2012-12-29 14:58 ` Paolo Bonzini 2012-12-29 15:19 ` Stefan Priebe 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2012-12-29 14:58 UTC (permalink / raw) To: Stefan Priebe; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela Il 29/12/2012 15:05, Stefan Priebe ha scritto: > Hi Paolo, > Am 29.12.2012 15:00, schrieb Paolo Bonzini: >>> i cherry picked that one on top of 1.3 sadly it does not help. VM halts, >>> monitor socket is no longer available kvm process is running with 100% >>> CPU on source side. >> >> Can you please test master and, if it works, bisect it in reverse? >> (That is, mark "bad" if it works, "good" if it fails). > > It's working fine with qemu master. > > I walked then through the bug patchset here: > http://www.mail-archive.com/qemu-commits@nongnu.org/msg02028.html > > and backported them step by step to qemu 1.3. > > It starts working to me after the first 22 patches (after introducing > the new mutex and threading for writes). And when does it break in 1.3? I suppose it will be between 05e72dc5812a9f461fc2c606dff2572909eafc39 and aa723c23147e93fef8475bd80fd29e633378c34d. Probably at 2dddf6f4133975af62e64cb6406ec1239491fa89, which was supposed to be placed before switching to the migration thread (or even squashed in it) but ended up much earlier when the project moved from me to Juan. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) 2012-12-29 14:58 ` Paolo Bonzini @ 2012-12-29 15:19 ` Stefan Priebe 2012-12-29 15:25 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Stefan Priebe @ 2012-12-29 15:19 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela Am 29.12.2012 15:58, schrieb Paolo Bonzini: > Il 29/12/2012 15:05, Stefan Priebe ha scritto: >> It starts working to me after the first 22 patches (after introducing >> the new mutex and threading for writes). > > And when does it break in 1.3? > > I suppose it will be between 05e72dc5812a9f461fc2c606dff2572909eafc39 > and aa723c23147e93fef8475bd80fd29e633378c34d. > > Probably at 2dddf6f4133975af62e64cb6406ec1239491fa89, which was supposed > to be placed before switching to the migration thread (or even squashed > in it) but ended up much earlier when the project moved from me to Juan. You mean by bisecting between qemu 1.2 and 1.3? I retested qemu 1.2 and it wasn't 100% working for me there too. It worked fine until migrate_downtime 1s it breaks / vm just halts when i set it to 2s with qemu 1.2. So i don't really know where to start bisecting. As i have NO version where it worked perfectly. Except Qemu 1.3 with the patches backported from 1.4 this works fine. Greets Stefan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) 2012-12-29 15:19 ` Stefan Priebe @ 2012-12-29 15:25 ` Paolo Bonzini 2012-12-31 13:25 ` Stefan Priebe 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2012-12-29 15:25 UTC (permalink / raw) To: Stefan Priebe; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela Il 29/12/2012 16:19, Stefan Priebe ha scritto: >> >> >> I suppose it will be between 05e72dc5812a9f461fc2c606dff2572909eafc39 >> and aa723c23147e93fef8475bd80fd29e633378c34d. >> >> Probably at 2dddf6f4133975af62e64cb6406ec1239491fa89, which was supposed >> to be placed before switching to the migration thread (or even squashed >> in it) but ended up much earlier when the project moved from me to Juan. > > You mean by bisecting between qemu 1.2 and 1.3? I retested qemu 1.2 and > it wasn't 100% working for me there too. It worked fine until > migrate_downtime 1s it breaks / vm just halts when i set it to 2s with > qemu 1.2. So i don't really know where to start bisecting. As i have NO > version where it worked perfectly. Except Qemu 1.3 with the patches > backported from 1.4 this works fine. Bisect between the two commits I gave above. There probably will a place when it starts failing reliably. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) 2012-12-29 15:25 ` Paolo Bonzini @ 2012-12-31 13:25 ` Stefan Priebe 0 siblings, 0 replies; 10+ messages in thread From: Stefan Priebe @ 2012-12-31 13:25 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela When i try to cancel a running migration with qemu git i get a segfault. BT: (gdb) bt #0 _wordcopy_bwd_aligned (dstp=140051233112024, srcp=140051233112016, len=529920) at wordcopy.c:298 #1 0x00007f61dd7c86da in *__GI_memmove (dest=0x7f6037bf5010, src=<optimized out>, len=38118264) at memmove.c:99 #2 0x00007f61e2e973c9 in buffered_flush (s=0x7f61e33a9e60) at migration.c:546 #3 0x00007f61e2e9746c in buffered_close (opaque=0x7f61e33a9e60) at migration.c:598 #4 0x00007f61e2f758ff in qemu_fclose (f=0x7f6044fc3200) at /opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/savevm.c:543 #5 0x00007f61e2e975b6 in migrate_fd_cleanup (s=0x7f61e33a9e60) at migration.c:277 #6 0x00007f61e2f7406b in handle_user_command (mon=0x7fffce7e3a90, cmdline=<optimized out>) at /opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:3945 #7 0x00007f61e2f74279 in qmp_human_monitor_command (command_line=0x7f604c9361b0 "migrate_cancel", has_cpu_index=false, cpu_index=140051576672336, errp=0x7fffce7e3f68) at /opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:664 #8 0x00007f61e2ecec07 in qmp_marshal_input_human_monitor_command (mon=<optimized out>, qdict=<optimized out>, ret=0x7fffce7e3ff0) at qmp-marshal.c:1505 #9 0x00007f61e2f6f53f in qmp_call_cmd (params=<optimized out>, cmd=<optimized out>, mon=<optimized out>) at /opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:4446 #10 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:4512 #11 0x00007f61e2e9039c in json_message_process_token (lexer=0x7f61d0012470, token=0x7f60389d6c60, type=JSON_OPERATOR, x=<optimized out>, y=<optimized out>) at json-streamer.c:87 #12 0x00007f61e2e8ec60 in json_lexer_feed_char (lexer=0x7f61d0012470, ch=125 '}', flush=false) at json-lexer.c:303 #13 0x00007f61e2e8ee19 in json_lexer_feed (lexer=0x7f61d0012470, buffer=0x7fffce7e41f0 "}\277\370M`\177", size=1) at json-lexer.c:356 #14 0x00007f61e2f6d65e in monitor_control_read (opaque=<optimized out>, buf=0x7f6040000000 " ", size=529920) at /opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:4533 #15 0x00007f61e2ebedab in tcp_chr_read (opaque=0x7f61e4e1e610) at qemu-char.c:2325 #16 0x00007f61e2e8dac7 in qemu_iohandler_poll (readfds=0x7f61e37bc660, writefds=0x7f61e37bc6e0, xfds=<optimized out>, ret=<optimized out>) at iohandler.c:124 #17 0x00007f61e2e95f79 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:418 #18 0x00007f61e2f0f56c in main_loop () at vl.c:1768 #19 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4047 Stefan Am 29.12.2012 16:25, schrieb Paolo Bonzini: > Il 29/12/2012 16:19, Stefan Priebe ha scritto: >>> >>> >>> I suppose it will be between 05e72dc5812a9f461fc2c606dff2572909eafc39 >>> and aa723c23147e93fef8475bd80fd29e633378c34d. >>> >>> Probably at 2dddf6f4133975af62e64cb6406ec1239491fa89, which was supposed >>> to be placed before switching to the migration thread (or even squashed >>> in it) but ended up much earlier when the project moved from me to Juan. >> >> You mean by bisecting between qemu 1.2 and 1.3? I retested qemu 1.2 and >> it wasn't 100% working for me there too. It worked fine until >> migrate_downtime 1s it breaks / vm just halts when i set it to 2s with >> qemu 1.2. So i don't really know where to start bisecting. As i have NO >> version where it worked perfectly. Except Qemu 1.3 with the patches >> backported from 1.4 this works fine. > > Bisect between the two commits I gave above. There probably will a > place when it starts failing reliably. > > Paolo > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-12-31 13:25 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-27 21:54 [Qemu-devel] setting migrate_downtime results in halted vm Stefan Priebe 2012-12-28 7:05 ` [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) Alexandre DERUMIER 2012-12-28 17:53 ` Paolo Bonzini 2012-12-28 19:03 ` Stefan Priebe 2012-12-29 14:00 ` Paolo Bonzini 2012-12-29 14:05 ` Stefan Priebe 2012-12-29 14:58 ` Paolo Bonzini 2012-12-29 15:19 ` Stefan Priebe 2012-12-29 15:25 ` Paolo Bonzini 2012-12-31 13:25 ` Stefan Priebe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).