[Qemu-devel] setting migrate_downtime results in halted vm

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] setting migrate_downtime results in halted vm
@ 2012-12-27 21:54 Stefan Priebe
  2012-12-28  7:05 ` [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) Alexandre DERUMIER
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Priebe @ 2012-12-27 21:54 UTC (permalink / raw)
  To: qemu-devel

Hello list,

i'm using qemu 1.3 and migration works fine if i do not set 
migrate_downtime.

If i set migrate_downtime to 1s or 0.5s or 0.3s the VM halts immediatly 
i cannot even connect to the qmp socket anymore and migration takes 5-10 
minutes or never finishes.

I see high cpu usage on source vm while it hangs.

VM was for all tests IDLE and was using 1GB of 4GB mem.

Do you have any ideas what happens here?

Greets,
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3)
  2012-12-27 21:54 [Qemu-devel] setting migrate_downtime results in halted vm Stefan Priebe
@ 2012-12-28  7:05 ` Alexandre DERUMIER
  2012-12-28 17:53   ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Alexandre DERUMIER @ 2012-12-28  7:05 UTC (permalink / raw)
  To: Stefan Priebe, Juan Quintela; +Cc: Paolo Bonzini, qemu-devel

Hi list,
After discuss with Stefan Yesterday here some more info:

(this is for stable qemu 1.3, it was working fine with qemu 1.2)

The problem seem that whesettings a migrate_set_downtime to 1sec,

the transfert of the vm seem to send all the memory of the vm in 1 step, and not by increment.
So the downtime is really huge , 90000ms for 1GB memory.
And I think the monitor doesn't response during this last step transfert

with migrate_set_downtime 30ms, the memory is correctly sent by increment, then the last step result around 500ms of downtime.

I can't reproduce it myself, only difference seem to be 10GBE Network of stefan. ( migrate_set_speed is set to 8589934592)

Stefan can reproduce it 100%.

I think something must be wrong in arch_init.c, But I'm not expert with qemu migration code.
So If you have any ideas/clues....

Regards,

Alexandre

migrate_set_dowtime:1
----------------------
Dec 27 12:57:11 starting migration of VM 105 to node 'cloud1-1202'
(10.255.0.20)
Dec 27 12:57:11 copying disk images
Dec 27 12:57:11 starting VM 105 on remote node 'cloud1-1202'
Dec 27 12:57:15 starting migration tunnel
Dec 27 12:57:15 starting online/live migration on port 60000
Dec 27 12:57:15 migrate_set_speed: 8589934592
Dec 27 12:57:15 migrate_set_downtime: 1
-> query-migrate: monitor hang
-> query-migrate: monitor hang
-> query-migrate: monitor hang
-> query-migrate: (finished)Dec 27 12:58:45 migration speed: 22.76 MB/s - downtime 90004 ms
Dec 27 12:58:45 migration status: completed
Dec 27 12:58:49 migration finished successfuly (duration 00:01:38)
TASK OK

migrate_set_downtime:03

The same again with 1GB Memory and migrate_downtime set to 0.03 (cached
mem):
Dec 27 13:00:19 starting migration of VM 105 to node 'cloud1-1203'
(10.255.0.22)
Dec 27 13:00:19 copying disk images
Dec 27 13:00:19 starting VM 105 on remote node 'cloud1-1203'
Dec 27 13:00:22 starting migration tunnel
Dec 27 13:00:23 starting online/live migration on port 60000
Dec 27 13:00:23 migrate_set_speed: 8589934592
Dec 27 13:00:23 migrate_set_downtime: 0.03
-> query-migrate : Dec 27 13:00:25 migration status: active (transferred 404647386, remaining 680390656), total 2156265472) , expected downtime 190
-> query-migrate : Dec 27 13:00:27 migration status: active (transferred 880582320, remaining 203579392), total 2156265472) , expected downtime 53
-> query-migrate : (status finished);Dec 27 13:00:29 migration speed: 341.33 MB/s - downtime 490 ms
Dec 27 13:00:29 migration status: completed
Dec 27 13:00:32 migration finished successfuly (duration 00:00:13)
TASK OK

----- Mail original -----

De: "Stefan Priebe" <s.priebe@profihost.ag>
À: "qemu-devel" <qemu-devel@nongnu.org>
Envoyé: Jeudi 27 Décembre 2012 22:54:55
Objet: [Qemu-devel] setting migrate_downtime results in halted vm

Hello list,

i'm using qemu 1.3 and migration works fine if i do not set
migrate_downtime.

If i set migrate_downtime to 1s or 0.5s or 0.3s the VM halts immediatly
i cannot even connect to the qmp socket anymore and migration takes 5-10
minutes or never finishes.

I see high cpu usage on source vm while it hangs.

VM was for all tests IDLE and was using 1GB of 4GB mem.

Do you have any ideas what happens here?

Greets,
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3)
  2012-12-28  7:05 ` [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) Alexandre DERUMIER
@ 2012-12-28 17:53   ` Paolo Bonzini
  2012-12-28 19:03     ` Stefan Priebe
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-12-28 17:53 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Juan Quintela, qemu-devel, Stefan Priebe

Il 28/12/2012 08:05, Alexandre DERUMIER ha scritto:
> Hi list,
> After discuss with Stefan Yesterday here some more info:
> 
> (this is for stable qemu 1.3, it was working fine with qemu 1.2)
> 
> The problem seem that whesettings a migrate_set_downtime to 1sec,
> 
> the transfert of the vm seem to send all the memory of the vm in 1 step, and not by increment.
> So the downtime is really huge , 90000ms for 1GB memory.

Can you try commit bde54c08b4854aceee3dee25121a2b835cb81166?

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3)
  2012-12-28 17:53   ` Paolo Bonzini
@ 2012-12-28 19:03     ` Stefan Priebe
  2012-12-29 14:00       ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Priebe @ 2012-12-28 19:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela

Hi Paolo,

Am 28.12.2012 18:53, schrieb Paolo Bonzini:
> Il 28/12/2012 08:05, Alexandre DERUMIER ha scritto:
>> Hi list,
>> After discuss with Stefan Yesterday here some more info:
>>
>> (this is for stable qemu 1.3, it was working fine with qemu 1.2)
>>
>> The problem seem that whesettings a migrate_set_downtime to 1sec,
>>
>> the transfert of the vm seem to send all the memory of the vm in 1 step, and not by increment.
>> So the downtime is really huge , 90000ms for 1GB memory.
>
> Can you try commit bde54c08b4854aceee3dee25121a2b835cb81166?

i cherry picked that one on top of 1.3 sadly it does not help. VM halts, 
monitor socket is no longer available kvm process is running with 100% 
CPU on source side.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3)
  2012-12-28 19:03     ` Stefan Priebe
@ 2012-12-29 14:00       ` Paolo Bonzini
  2012-12-29 14:05         ` Stefan Priebe
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-12-29 14:00 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela

Il 28/12/2012 20:03, Stefan Priebe ha scritto:
> Hi Paolo,
> 
> Am 28.12.2012 18:53, schrieb Paolo Bonzini:
>> Il 28/12/2012 08:05, Alexandre DERUMIER ha scritto:
>>> Hi list,
>>> After discuss with Stefan Yesterday here some more info:
>>>
>>> (this is for stable qemu 1.3, it was working fine with qemu 1.2)
>>>
>>> The problem seem that whesettings a migrate_set_downtime to 1sec,
>>>
>>> the transfert of the vm seem to send all the memory of the vm in 1
>>> step, and not by increment.
>>> So the downtime is really huge , 90000ms for 1GB memory.
>>
>> Can you try commit bde54c08b4854aceee3dee25121a2b835cb81166?
> 
> i cherry picked that one on top of 1.3 sadly it does not help. VM halts,
> monitor socket is no longer available kvm process is running with 100%
> CPU on source side.

Can you please test master and, if it works, bisect it in reverse?
(That is, mark "bad" if it works, "good" if it fails).

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3)
  2012-12-29 14:00       ` Paolo Bonzini
@ 2012-12-29 14:05         ` Stefan Priebe
  2012-12-29 14:58           ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Priebe @ 2012-12-29 14:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela

Hi Paolo,
Am 29.12.2012 15:00, schrieb Paolo Bonzini:
>> i cherry picked that one on top of 1.3 sadly it does not help. VM halts,
>> monitor socket is no longer available kvm process is running with 100%
>> CPU on source side.
>
> Can you please test master and, if it works, bisect it in reverse?
> (That is, mark "bad" if it works, "good" if it fails).

It's working fine with qemu master.

I walked then through the bug patchset here:
http://www.mail-archive.com/qemu-commits@nongnu.org/msg02028.html

and backported them step by step to qemu 1.3.

It starts working to me after the first 22 patches (after introducing 
the new mutex and threading for writes).

Greets,
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3)
  2012-12-29 14:05         ` Stefan Priebe
@ 2012-12-29 14:58           ` Paolo Bonzini
  2012-12-29 15:19             ` Stefan Priebe
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-12-29 14:58 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela

Il 29/12/2012 15:05, Stefan Priebe ha scritto:
> Hi Paolo,
> Am 29.12.2012 15:00, schrieb Paolo Bonzini:
>>> i cherry picked that one on top of 1.3 sadly it does not help. VM halts,
>>> monitor socket is no longer available kvm process is running with 100%
>>> CPU on source side.
>>
>> Can you please test master and, if it works, bisect it in reverse?
>> (That is, mark "bad" if it works, "good" if it fails).
> 
> It's working fine with qemu master.
> 
> I walked then through the bug patchset here:
> http://www.mail-archive.com/qemu-commits@nongnu.org/msg02028.html
> 
> and backported them step by step to qemu 1.3.
> 
> It starts working to me after the first 22 patches (after introducing
> the new mutex and threading for writes).

And when does it break in 1.3?

I suppose it will be between 05e72dc5812a9f461fc2c606dff2572909eafc39
and aa723c23147e93fef8475bd80fd29e633378c34d.

Probably at 2dddf6f4133975af62e64cb6406ec1239491fa89, which was supposed
to be placed before switching to the migration thread (or even squashed
in it) but ended up much earlier when the project moved from me to Juan.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3)
  2012-12-29 14:58           ` Paolo Bonzini
@ 2012-12-29 15:19             ` Stefan Priebe
  2012-12-29 15:25               ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Priebe @ 2012-12-29 15:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela

Am 29.12.2012 15:58, schrieb Paolo Bonzini:
> Il 29/12/2012 15:05, Stefan Priebe ha scritto:
>> It starts working to me after the first 22 patches (after introducing
>> the new mutex and threading for writes).
>
> And when does it break in 1.3?
>
> I suppose it will be between 05e72dc5812a9f461fc2c606dff2572909eafc39
> and aa723c23147e93fef8475bd80fd29e633378c34d.
>
> Probably at 2dddf6f4133975af62e64cb6406ec1239491fa89, which was supposed
> to be placed before switching to the migration thread (or even squashed
> in it) but ended up much earlier when the project moved from me to Juan.

You mean by bisecting between qemu 1.2 and 1.3? I retested qemu 1.2 and 
it wasn't 100% working for me there too. It worked fine until 
migrate_downtime 1s it breaks / vm just halts when i set it to 2s with 
qemu 1.2. So i don't really know where to start bisecting. As i have NO 
version where it worked perfectly. Except Qemu 1.3 with the patches 
backported from 1.4 this works fine.

Greets
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3)
  2012-12-29 15:19             ` Stefan Priebe
@ 2012-12-29 15:25               ` Paolo Bonzini
  2012-12-31 13:25                 ` Stefan Priebe
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-12-29 15:25 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela

Il 29/12/2012 16:19, Stefan Priebe ha scritto:
>>
>>
>> I suppose it will be between 05e72dc5812a9f461fc2c606dff2572909eafc39
>> and aa723c23147e93fef8475bd80fd29e633378c34d.
>>
>> Probably at 2dddf6f4133975af62e64cb6406ec1239491fa89, which was supposed
>> to be placed before switching to the migration thread (or even squashed
>> in it) but ended up much earlier when the project moved from me to Juan.
> 
> You mean by bisecting between qemu 1.2 and 1.3? I retested qemu 1.2 and
> it wasn't 100% working for me there too. It worked fine until
> migrate_downtime 1s it breaks / vm just halts when i set it to 2s with
> qemu 1.2. So i don't really know where to start bisecting. As i have NO
> version where it worked perfectly. Except Qemu 1.3 with the patches
> backported from 1.4 this works fine.

Bisect between the two commits I gave above.  There probably will a
place when it starts failing reliably.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3)
  2012-12-29 15:25               ` Paolo Bonzini
@ 2012-12-31 13:25                 ` Stefan Priebe
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Priebe @ 2012-12-31 13:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, Juan Quintela

When i try to cancel a running migration with qemu git i get a segfault.

BT:
(gdb) bt
#0  _wordcopy_bwd_aligned (dstp=140051233112024, srcp=140051233112016, 
len=529920) at wordcopy.c:298
#1  0x00007f61dd7c86da in *__GI_memmove (dest=0x7f6037bf5010, 
src=<optimized out>, len=38118264) at memmove.c:99
#2  0x00007f61e2e973c9 in buffered_flush (s=0x7f61e33a9e60) at 
migration.c:546
#3  0x00007f61e2e9746c in buffered_close (opaque=0x7f61e33a9e60) at 
migration.c:598
#4  0x00007f61e2f758ff in qemu_fclose (f=0x7f6044fc3200)
     at 
/opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/savevm.c:543
#5  0x00007f61e2e975b6 in migrate_fd_cleanup (s=0x7f61e33a9e60) at 
migration.c:277
#6  0x00007f61e2f7406b in handle_user_command (mon=0x7fffce7e3a90, 
cmdline=<optimized out>)
     at 
/opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:3945
#7  0x00007f61e2f74279 in qmp_human_monitor_command 
(command_line=0x7f604c9361b0 "migrate_cancel", has_cpu_index=false,
     cpu_index=140051576672336, errp=0x7fffce7e3f68) at 
/opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:664
#8  0x00007f61e2ecec07 in qmp_marshal_input_human_monitor_command 
(mon=<optimized out>, qdict=<optimized out>, ret=0x7fffce7e3ff0)
     at qmp-marshal.c:1505
#9  0x00007f61e2f6f53f in qmp_call_cmd (params=<optimized out>, 
cmd=<optimized out>, mon=<optimized out>)
     at 
/opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:4446
#10 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>)
     at 
/opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:4512
#11 0x00007f61e2e9039c in json_message_process_token 
(lexer=0x7f61d0012470, token=0x7f60389d6c60, type=JSON_OPERATOR,
     x=<optimized out>, y=<optimized out>) at json-streamer.c:87
#12 0x00007f61e2e8ec60 in json_lexer_feed_char (lexer=0x7f61d0012470, 
ch=125 '}', flush=false) at json-lexer.c:303
#13 0x00007f61e2e8ee19 in json_lexer_feed (lexer=0x7f61d0012470, 
buffer=0x7fffce7e41f0 "}\277\370M`\177", size=1)
     at json-lexer.c:356
#14 0x00007f61e2f6d65e in monitor_control_read (opaque=<optimized out>, 
buf=0x7f6040000000 " ", size=529920)
     at 
/opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/monitor.c:4533
#15 0x00007f61e2ebedab in tcp_chr_read (opaque=0x7f61e4e1e610) at 
qemu-char.c:2325
#16 0x00007f61e2e8dac7 in qemu_iohandler_poll (readfds=0x7f61e37bc660, 
writefds=0x7f61e37bc6e0, xfds=<optimized out>,
     ret=<optimized out>) at iohandler.c:124
#17 0x00007f61e2e95f79 in main_loop_wait (nonblocking=<optimized out>) 
at main-loop.c:418
#18 0x00007f61e2f0f56c in main_loop () at vl.c:1768
#19 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized 
out>) at vl.c:4047

Stefan

Am 29.12.2012 16:25, schrieb Paolo Bonzini:
> Il 29/12/2012 16:19, Stefan Priebe ha scritto:
>>>
>>>
>>> I suppose it will be between 05e72dc5812a9f461fc2c606dff2572909eafc39
>>> and aa723c23147e93fef8475bd80fd29e633378c34d.
>>>
>>> Probably at 2dddf6f4133975af62e64cb6406ec1239491fa89, which was supposed
>>> to be placed before switching to the migration thread (or even squashed
>>> in it) but ended up much earlier when the project moved from me to Juan.
>>
>> You mean by bisecting between qemu 1.2 and 1.3? I retested qemu 1.2 and
>> it wasn't 100% working for me there too. It worked fine until
>> migrate_downtime 1s it breaks / vm just halts when i set it to 2s with
>> qemu 1.2. So i don't really know where to start bisecting. As i have NO
>> version where it worked perfectly. Except Qemu 1.3 with the patches
>> backported from 1.4 this works fine.
>
> Bisect between the two commits I gave above.  There probably will a
> place when it starts failing reliably.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-12-31 13:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-27 21:54 [Qemu-devel] setting migrate_downtime results in halted vm Stefan Priebe
2012-12-28  7:05 ` [Qemu-devel] setting migrate_downtime results in halted vm (qemu 1.3) Alexandre DERUMIER
2012-12-28 17:53   ` Paolo Bonzini
2012-12-28 19:03     ` Stefan Priebe
2012-12-29 14:00       ` Paolo Bonzini
2012-12-29 14:05         ` Stefan Priebe
2012-12-29 14:58           ` Paolo Bonzini
2012-12-29 15:19             ` Stefan Priebe
2012-12-29 15:25               ` Paolo Bonzini
2012-12-31 13:25                 ` Stefan Priebe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).