qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] latest rc: virtio-blk hangs forever after migration
@ 2014-07-13 12:28 Andrey Korolyov
  2014-07-13 15:29 ` Andrey Korolyov
  2014-07-15  5:03 ` Amit Shah
  0 siblings, 2 replies; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-13 12:28 UTC (permalink / raw)
  To: qemu-devel@nongnu.org; +Cc: Paolo Bonzini, Fam Zheng

Hello,

the issue is not specific to the iothread code because generic
virtio-blk also hangs up:

Given code set like in the
http://www.mail-archive.com/qemu-devel@nongnu.org/msg246164.html,
launch a VM with virtio-blk disk and writeback rbd backend, fire up
fio, migrate once with libvirt:

time virsh migrate vm27842 qemu+tcp://10.6.0.10/system --live
--persistent --undefinesource --timeout 60


real    1m2.969s
user    0m0.016s
sys     0m0.008s

For ones who are unfamiliar with libvirt syntax this means that the
live migration was failed to complete over sixty seconds, VM was
frozen, moved and re-launched at the destination. After this, I/O gets
stuck forever. Any diagnostic information is available upon request if
there will be difficulties repeating the issue.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-13 12:28 [Qemu-devel] latest rc: virtio-blk hangs forever after migration Andrey Korolyov
@ 2014-07-13 15:29 ` Andrey Korolyov
  2014-07-15 15:57   ` Paolo Bonzini
  2014-07-15  5:03 ` Amit Shah
  1 sibling, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-13 15:29 UTC (permalink / raw)
  To: qemu-devel@nongnu.org; +Cc: Paolo Bonzini, Fam Zheng

On Sun, Jul 13, 2014 at 4:28 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> Hello,
>
> the issue is not specific to the iothread code because generic
> virtio-blk also hangs up:
>
> Given code set like in the
> http://www.mail-archive.com/qemu-devel@nongnu.org/msg246164.html,
> launch a VM with virtio-blk disk and writeback rbd backend, fire up
> fio, migrate once with libvirt:
>
> time virsh migrate vm27842 qemu+tcp://10.6.0.10/system --live
> --persistent --undefinesource --timeout 60
>
>
> real    1m2.969s
> user    0m0.016s
> sys     0m0.008s
>
> For ones who are unfamiliar with libvirt syntax this means that the
> live migration was failed to complete over sixty seconds, VM was
> frozen, moved and re-launched at the destination. After this, I/O gets
> stuck forever. Any diagnostic information is available upon request if
> there will be difficulties repeating the issue.

Small follow-up: issue have probabilistic nature, as it looks - by
limited number of runs, it is reproducible within three cases:
 1) live migration went well, I/O locked up,
 2) live migration failed by timeout, I/O locked up,
 3) live migration went well and disk was not locked, but on backward
migration we are always hitting 2).

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-13 12:28 [Qemu-devel] latest rc: virtio-blk hangs forever after migration Andrey Korolyov
  2014-07-13 15:29 ` Andrey Korolyov
@ 2014-07-15  5:03 ` Amit Shah
  2014-07-15  6:52   ` Andrey Korolyov
  1 sibling, 1 reply; 76+ messages in thread
From: Amit Shah @ 2014-07-15  5:03 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
> Hello,
> 
> the issue is not specific to the iothread code because generic
> virtio-blk also hangs up:

Do you know which version works well?  If you could bisect, that'll
help a lot.

Thanks,
		Amit

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15  5:03 ` Amit Shah
@ 2014-07-15  6:52   ` Andrey Korolyov
  2014-07-15 14:01     ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-15  6:52 UTC (permalink / raw)
  To: Amit Shah; +Cc: Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com> wrote:
> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>> Hello,
>>
>> the issue is not specific to the iothread code because generic
>> virtio-blk also hangs up:
>
> Do you know which version works well?  If you could bisect, that'll
> help a lot.
>
> Thanks,
>                 Amit

Hi,

2.0 works definitely well. I`ll try to finish bisection today, though
every step takes about 10 minutes to complete.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15  6:52   ` Andrey Korolyov
@ 2014-07-15 14:01     ` Andrey Korolyov
  2014-07-15 21:09       ` Marcelo Tosatti
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-15 14:01 UTC (permalink / raw)
  To: Amit Shah
  Cc: Paolo Bonzini, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com> wrote:
>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>>> Hello,
>>>
>>> the issue is not specific to the iothread code because generic
>>> virtio-blk also hangs up:
>>
>> Do you know which version works well?  If you could bisect, that'll
>> help a lot.
>>
>> Thanks,
>>                 Amit
>
> Hi,
>
> 2.0 works definitely well. I`ll try to finish bisection today, though
> every step takes about 10 minutes to complete.

Yay! It is even outside of virtio-blk.

commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
Author: Marcelo Tosatti <mtosatti@redhat.com>
Date:   Tue Jun 3 13:34:48 2014 -0300

    kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation

    Ensure proper env->tsc value for kvmclock_current_nsec calculation.

    Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-13 15:29 ` Andrey Korolyov
@ 2014-07-15 15:57   ` Paolo Bonzini
  2014-07-15 17:32     ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-07-15 15:57 UTC (permalink / raw)
  To: Andrey Korolyov, qemu-devel@nongnu.org; +Cc: Fam Zheng

Il 13/07/2014 17:29, Andrey Korolyov ha scritto:
> Small follow-up: issue have probabilistic nature, as it looks - by
> limited number of runs, it is reproducible within three cases:
>  1) live migration went well, I/O locked up,
>  2) live migration failed by timeout, I/O locked up,
>  3) live migration went well and disk was not locked, but on backward
> migration we are always hitting 2).

Can you provide a gdb backtrace of case (2)?

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15 15:57   ` Paolo Bonzini
@ 2014-07-15 17:32     ` Andrey Korolyov
  2014-07-15 17:39       ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-15 17:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Fam Zheng, qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On Tue, Jul 15, 2014 at 7:57 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 13/07/2014 17:29, Andrey Korolyov ha scritto:
>
>> Small follow-up: issue have probabilistic nature, as it looks - by
>> limited number of runs, it is reproducible within three cases:
>>  1) live migration went well, I/O locked up,
>>  2) live migration failed by timeout, I/O locked up,
>>  3) live migration went well and disk was not locked, but on backward
>> migration we are always hitting 2).
>
>
> Can you provide a gdb backtrace of case (2)?
>
> Paolo
>

Sorry, I`ve mixed reset into. The timeout on migration happens if
there was already hang before reset. Bt-reset and bt-noreset
corresponds to two backtraces first of which had hang+reset
previously. As it looks by bisection, the problem introduced by fix
targeted to patch exactly the same behavior. Moreover, rolling
problematic commit back (9a833121eb253589143ff8fe30be8a311a2c16b3)
fixes situation for me.

[-- Attachment #2: bt-noreset.txt --]
[-- Type: text/plain, Size: 18103 bytes --]

Thread 33 (Thread 0x7fb632fcc700 (LWP 3023)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63ed7e51b in ceph::log::Log::entry (this=0x7fb642f13860) at log/Log.cc:323
#2  0x00007fb63b321e9a in start_thread (arg=0x7fb632fcc700) at pthread_create.c:308
#3  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 32 (Thread 0x7fb6327cb700 (LWP 3024)):
#0  sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:102
#1  0x00007fb63ee57b18 in CephContextServiceThread::entry (this=0x7fb642f15630) at common/ceph_context.cc:57
#2  0x00007fb63b321e9a in start_thread (arg=0x7fb6327cb700) at pthread_create.c:308
#3  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 31 (Thread 0x7fb631fca700 (LWP 3025)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63ed9ae6b in Wait (mutex=..., this=0x7fb642f1afa0) at ./common/Cond.h:55
#2  DispatchQueue::entry (this=0x7fb642f1af38) at msg/DispatchQueue.cc:129
#3  0x00007fb63ee035ad in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb631fca700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 30 (Thread 0x7fb6317c9700 (LWP 3026)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63edff5f0 in Wait (mutex=..., this=0x7fb642f1b360) at ./common/Cond.h:55
#2  SimpleMessenger::reaper_entry (this=0x7fb642f1ae50) at msg/SimpleMessenger.cc:206
#3  0x00007fb63ee03e7d in SimpleMessenger::ReaperThread::entry (this=<optimized out>) at msg/SimpleMessenger.h:422
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb6317c9700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 29 (Thread 0x7fb630fc8700 (LWP 3027)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x00007fb63ed33238 in WaitUntil (when=..., mutex=..., this=0x7fb642f15a40) at common/Cond.h:71
#2  SafeTimer::timer_thread (this=0x7fb642f15a30) at common/Timer.cc:114
#3  0x00007fb63ed3416d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb630fc8700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 28 (Thread 0x7fb6307c7700 (LWP 3028)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63ed34bdf in Wait (mutex=..., this=0x7fb642f15b58) at ./common/Cond.h:55
#2  Finisher::finisher_thread_entry (this=0x7fb642f15af8) at common/Finisher.cc:80
#3  0x00007fb63b321e9a in start_thread (arg=0x7fb6307c7700) at pthread_create.c:308
#4  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()

Thread 27 (Thread 0x7fb62ffc6700 (LWP 3029)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63edeee1e in Wait (mutex=..., this=0x7fb642f1d1e8) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7fb642f1d000) at msg/Pipe.cc:1610
#3  0x00007fb63edfa1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb62ffc6700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 26 (Thread 0x7fb62fec5700 (LWP 3030)):
#0  0x00007fb63b043a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fb63ede3f0c in Pipe::tcp_read_wait (this=this@entry=0x7fb642f1d000) at msg/Pipe.cc:2144
#2  0x00007fb63ede41d0 in Pipe::tcp_read (this=this@entry=0x7fb642f1d000, buf=<optimized out>, buf@entry=0x7fb62fec4aef "\377\a", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007fb63edf610e in Pipe::reader (this=0x7fb642f1d000) at msg/Pipe.cc:1353
#4  0x00007fb63edfa2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007fb63b321e9a in start_thread (arg=0x7fb62fec5700) at pthread_create.c:308
#6  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 25 (Thread 0x7fb62fdc4700 (LWP 3031)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x00007fb63ed33238 in WaitUntil (when=..., mutex=..., this=0x7fb642f15e48) at common/Cond.h:71
#2  SafeTimer::timer_thread (this=0x7fb642f15e38) at common/Timer.cc:114
#3  0x00007fb63ed3416d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb62fdc4700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 24 (Thread 0x7fb62f5c3700 (LWP 3032)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63ed34bdf in Wait (mutex=..., this=0x7fb642f15f68) at ./common/Cond.h:55
#2  Finisher::finisher_thread_entry (this=0x7fb642f15f08) at common/Finisher.cc:80
#3  0x00007fb63b321e9a in start_thread (arg=0x7fb62f5c3700) at pthread_create.c:308
#4  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()

Thread 23 (Thread 0x7fb62edc2700 (LWP 3033)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63ed34bdf in Wait (mutex=..., this=0x7fb642f20858) at ./common/Cond.h:55
#2  Finisher::finisher_thread_entry (this=0x7fb642f207f8) at common/Finisher.cc:80
#3  0x00007fb63b321e9a in start_thread (arg=0x7fb62edc2700) at pthread_create.c:308
#4  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()

Thread 22 (Thread 0x7fb62e5c1700 (LWP 3034)):

#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x00007fb63f9890bc in WaitUntil (when=..., mutex=..., this=0x7fb642f20798) at ./common/Cond.h:71
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x7fb642f20798) at ./common/Cond.h:79
#3  ObjectCacher::flusher_entry (this=0x7fb642f205c0) at osdc/ObjectCacher.cc:1458
#4  0x00007fb63f99c20d in ObjectCacher::FlusherThread::entry (this=<optimized out>) at osdc/ObjectCacher.h:354
#5  0x00007fb63b321e9a in start_thread (arg=0x7fb62e5c1700) at pthread_create.c:308
#6  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 21 (Thread 0x7fb62ddc0700 (LWP 3035)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63edeee1e in Wait (mutex=..., this=0x7fb642f21548) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7fb642f21360) at msg/Pipe.cc:1610
#3  0x00007fb63edfa1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb62ddc0700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 20 (Thread 0x7fb62dcbf700 (LWP 3036)):
#0  0x00007fb63b043a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fb63ede3f0c in Pipe::tcp_read_wait (this=this@entry=0x7fb642f21360) at msg/Pipe.cc:2144
#2  0x00007fb63ede41d0 in Pipe::tcp_read (this=this@entry=0x7fb642f21360, buf=<optimized out>, buf@entry=0x7fb62dcbeaef "\377\001", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007fb63edf610e in Pipe::reader (this=0x7fb642f21360) at msg/Pipe.cc:1353
#4  0x00007fb63edfa2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007fb63b321e9a in start_thread (arg=0x7fb62dcbf700) at pthread_create.c:308
#6  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 19 (Thread 0x7fb62dbbe700 (LWP 3037)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63edeee1e in Wait (mutex=..., this=0x7fb642f21be8) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7fb642f21a00) at msg/Pipe.cc:1610
#3  0x00007fb63edfa1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb62dbbe700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 18 (Thread 0x7fb62dabd700 (LWP 3038)):
#0  0x00007fb63b043a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fb63ede3f0c in Pipe::tcp_read_wait (this=this@entry=0x7fb642f21a00) at msg/Pipe.cc:2144
#2  0x00007fb63ede41d0 in Pipe::tcp_read (this=this@entry=0x7fb642f21a00, buf=<optimized out>, buf@entry=0x7fb62dabcaef "\377\002", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007fb63edf610e in Pipe::reader (this=0x7fb642f21a00) at msg/Pipe.cc:1353
#4  0x00007fb63edfa2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007fb63b321e9a in start_thread (arg=0x7fb62dabd700) at pthread_create.c:308
#6  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 17 (Thread 0x7fb62d9bc700 (LWP 3039)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63edeee1e in Wait (mutex=..., this=0x7fb642f224f8) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7fb642f22310) at msg/Pipe.cc:1610
#3  0x00007fb63edfa1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb62d9bc700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 16 (Thread 0x7fb62d8bb700 (LWP 3040)):
#0  0x00007fb63b043a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fb63ede3f0c in Pipe::tcp_read_wait (this=this@entry=0x7fb642f22310) at msg/Pipe.cc:2144
#2  0x00007fb63ede41d0 in Pipe::tcp_read (this=this@entry=0x7fb642f22310, buf=<optimized out>, buf@entry=0x7fb62d8baaef "\377\033", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007fb63edf610e in Pipe::reader (this=0x7fb642f22310) at msg/Pipe.cc:1353
#4  0x00007fb63edfa2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007fb63b321e9a in start_thread (arg=0x7fb62d8bb700) at pthread_create.c:308
#6  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 15 (Thread 0x7fb62d7ba700 (LWP 3041)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb62d7ba700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 14 (Thread 0x7fb62cfb9700 (LWP 3042)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb62cfb9700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 13 (Thread 0x7fb5fffff700 (LWP 3043)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5fffff700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 12 (Thread 0x7fb5ff7fe700 (LWP 3044)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5ff7fe700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 11 (Thread 0x7fb5feffd700 (LWP 3045)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5feffd700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7fb5fe7fc700 (LWP 3046)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5fe7fc700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7fb5fdffb700 (LWP 3047)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5fdffb700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7fb5fd7fa700 (LWP 3048)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5fd7fa700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7fb5fcff9700 (LWP 3049)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5fcff9700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7fb5dffff700 (LWP 3050)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5dffff700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7fb5df7fe700 (LWP 3051)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5df7fe700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7fb5deffd700 (LWP 3052)):
#0  0x00007fb63b047c37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb6405560a9 in ?? ()
#2  0x00007fb6405561e5 in ?? ()
#3  0x00007fb6405419bc in ?? ()
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb5deffd700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fb62c5ff700 (LWP 3054)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb63edeee1e in Wait (mutex=..., this=0x7fb643060548) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7fb643060360) at msg/Pipe.cc:1610
#3  0x00007fb63edfa1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007fb63b321e9a in start_thread (arg=0x7fb62c5ff700) at pthread_create.c:308
#5  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7fb62c3ff700 (LWP 3055)):
#0  0x00007fb63b043a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fb63ede3f0c in Pipe::tcp_read_wait (this=this@entry=0x7fb643060360) at msg/Pipe.cc:2144
#2  0x00007fb63ede41d0 in Pipe::tcp_read (this=this@entry=0x7fb643060360, buf=<optimized out>, buf@entry=0x7fb62c3feaef "\377\001", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007fb63edf610e in Pipe::reader (this=0x7fb643060360) at msg/Pipe.cc:1353
#4  0x00007fb63edfa2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007fb63b321e9a in start_thread (arg=0x7fb62c3ff700) at pthread_create.c:308
#6  0x00007fb63b04f3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7fb640349940 (LWP 3020)):
#0  0x00007fb63b043ae3 in ppoll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>, sigmask=<optimized out>) at ../sysdeps/unix/sysv/linux/ppoll.c:58
#1  0x00007fb6407b6679 in ?? ()
#2  0x00007fb6407b5a64 in ?? ()
#3  0x00007fb640513665 in main ()


[-- Attachment #3: bt-reset.txt --]
[-- Type: text/plain, Size: 18099 bytes --]

Thread 33 (Thread 0x7f0098460700 (LWP 2886)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a421251b in ceph::log::Log::entry (this=0x7f00a7de5860) at log/Log.cc:323
#2  0x00007f00a07b5e9a in start_thread (arg=0x7f0098460700) at pthread_create.c:308
#3  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 32 (Thread 0x7f0097c5f700 (LWP 2887)):
#0  sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:102
#1  0x00007f00a42ebb18 in CephContextServiceThread::entry (this=0x7f00a7de7630) at common/ceph_context.cc:57
#2  0x00007f00a07b5e9a in start_thread (arg=0x7f0097c5f700) at pthread_create.c:308
#3  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 31 (Thread 0x7f009745e700 (LWP 2888)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a422ee6b in Wait (mutex=..., this=0x7f00a7decfa0) at ./common/Cond.h:55
#2  DispatchQueue::entry (this=0x7f00a7decf38) at msg/DispatchQueue.cc:129
#3  0x00007f00a42975ad in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f009745e700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 30 (Thread 0x7f0096c5d700 (LWP 2889)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a42935f0 in Wait (mutex=..., this=0x7f00a7ded360) at ./common/Cond.h:55
#2  SimpleMessenger::reaper_entry (this=0x7f00a7dece50) at msg/SimpleMessenger.cc:206
#3  0x00007f00a4297e7d in SimpleMessenger::ReaperThread::entry (this=<optimized out>) at msg/SimpleMessenger.h:422
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0096c5d700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 29 (Thread 0x7f009645c700 (LWP 2890)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x00007f00a41c7238 in WaitUntil (when=..., mutex=..., this=0x7f00a7de7a40) at common/Cond.h:71
#2  SafeTimer::timer_thread (this=0x7f00a7de7a30) at common/Timer.cc:114
#3  0x00007f00a41c816d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f009645c700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 28 (Thread 0x7f0095c5b700 (LWP 2891)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a41c8bdf in Wait (mutex=..., this=0x7f00a7de7b58) at ./common/Cond.h:55
#2  Finisher::finisher_thread_entry (this=0x7f00a7de7af8) at common/Finisher.cc:80
#3  0x00007f00a07b5e9a in start_thread (arg=0x7f0095c5b700) at pthread_create.c:308
#4  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()

Thread 27 (Thread 0x7f009545a700 (LWP 2892)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a4282e1e in Wait (mutex=..., this=0x7f00a7def1e8) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7f00a7def000) at msg/Pipe.cc:1610
#3  0x00007f00a428e1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f009545a700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 26 (Thread 0x7f0095359700 (LWP 2893)):
#0  0x00007f00a04d7a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f00a4277f0c in Pipe::tcp_read_wait (this=this@entry=0x7f00a7def000) at msg/Pipe.cc:2144
#2  0x00007f00a42781d0 in Pipe::tcp_read (this=this@entry=0x7f00a7def000, buf=<optimized out>, buf@entry=0x7f0095358aef "\377\b", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007f00a428a10e in Pipe::reader (this=0x7f00a7def000) at msg/Pipe.cc:1353
#4  0x00007f00a428e2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007f00a07b5e9a in start_thread (arg=0x7f0095359700) at pthread_create.c:308
#6  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 25 (Thread 0x7f0095258700 (LWP 2894)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x00007f00a41c7238 in WaitUntil (when=..., mutex=..., this=0x7f00a7de7e48) at common/Cond.h:71
#2  SafeTimer::timer_thread (this=0x7f00a7de7e38) at common/Timer.cc:114
#3  0x00007f00a41c816d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0095258700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 24 (Thread 0x7f0094a57700 (LWP 2895)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a41c8bdf in Wait (mutex=..., this=0x7f00a7de7f68) at ./common/Cond.h:55
#2  Finisher::finisher_thread_entry (this=0x7f00a7de7f08) at common/Finisher.cc:80
#3  0x00007f00a07b5e9a in start_thread (arg=0x7f0094a57700) at pthread_create.c:308
#4  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()

Thread 23 (Thread 0x7f0087fff700 (LWP 2896)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a41c8bdf in Wait (mutex=..., this=0x7f00a7df2858) at ./common/Cond.h:55
#2  Finisher::finisher_thread_entry (this=0x7f00a7df27f8) at common/Finisher.cc:80
#3  0x00007f00a07b5e9a in start_thread (arg=0x7f0087fff700) at pthread_create.c:308
#4  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()

Thread 22 (Thread 0x7f00877fe700 (LWP 2897)):

#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x00007f00a4e1d0bc in WaitUntil (when=..., mutex=..., this=0x7f00a7df2798) at ./common/Cond.h:71
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x7f00a7df2798) at ./common/Cond.h:79
#3  ObjectCacher::flusher_entry (this=0x7f00a7df25c0) at osdc/ObjectCacher.cc:1458
#4  0x00007f00a4e3020d in ObjectCacher::FlusherThread::entry (this=<optimized out>) at osdc/ObjectCacher.h:354
#5  0x00007f00a07b5e9a in start_thread (arg=0x7f00877fe700) at pthread_create.c:308
#6  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 21 (Thread 0x7f0094256700 (LWP 2898)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a4282e1e in Wait (mutex=..., this=0x7f00a7df3548) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7f00a7df3360) at msg/Pipe.cc:1610
#3  0x00007f00a428e1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0094256700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 20 (Thread 0x7f0094155700 (LWP 2899)):
#0  0x00007f00a04d7a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f00a4277f0c in Pipe::tcp_read_wait (this=this@entry=0x7f00a7df3360) at msg/Pipe.cc:2144
#2  0x00007f00a42781d0 in Pipe::tcp_read (this=this@entry=0x7f00a7df3360, buf=<optimized out>, buf@entry=0x7f0094154aef "\377\001", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007f00a428a10e in Pipe::reader (this=0x7f00a7df3360) at msg/Pipe.cc:1353
#4  0x00007f00a428e2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007f00a07b5e9a in start_thread (arg=0x7f0094155700) at pthread_create.c:308
#6  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 19 (Thread 0x7f0086ffd700 (LWP 2900)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a4282e1e in Wait (mutex=..., this=0x7f00a7df3be8) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7f00a7df3a00) at msg/Pipe.cc:1610
#3  0x00007f00a428e1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0086ffd700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 18 (Thread 0x7f0086efc700 (LWP 2901)):
#0  0x00007f00a04d7a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f00a4277f0c in Pipe::tcp_read_wait (this=this@entry=0x7f00a7df3a00) at msg/Pipe.cc:2144
#2  0x00007f00a42781d0 in Pipe::tcp_read (this=this@entry=0x7f00a7df3a00, buf=<optimized out>, buf@entry=0x7f0086efbaef "\377\002", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007f00a428a10e in Pipe::reader (this=0x7f00a7df3a00) at msg/Pipe.cc:1353
#4  0x00007f00a428e2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007f00a07b5e9a in start_thread (arg=0x7f0086efc700) at pthread_create.c:308
#6  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 17 (Thread 0x7f0086dfb700 (LWP 2902)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a4282e1e in Wait (mutex=..., this=0x7f00a7df44f8) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7f00a7df4310) at msg/Pipe.cc:1610
#3  0x00007f00a428e1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0086dfb700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 16 (Thread 0x7f0086cfa700 (LWP 2903)):
#0  0x00007f00a04d7a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f00a4277f0c in Pipe::tcp_read_wait (this=this@entry=0x7f00a7df4310) at msg/Pipe.cc:2144
#2  0x00007f00a42781d0 in Pipe::tcp_read (this=this@entry=0x7f00a7df4310, buf=<optimized out>, buf@entry=0x7f0086cf9aef "\377.", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007f00a428a10e in Pipe::reader (this=0x7f00a7df4310) at msg/Pipe.cc:1353
#4  0x00007f00a428e2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007f00a07b5e9a in start_thread (arg=0x7f0086cfa700) at pthread_create.c:308
#6  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 15 (Thread 0x7f0086bf9700 (LWP 2904)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0086bf9700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 14 (Thread 0x7f00863f8700 (LWP 2905)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f00863f8700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 13 (Thread 0x7f0085bf7700 (LWP 2906)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0085bf7700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 12 (Thread 0x7f00853f6700 (LWP 2907)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f00853f6700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 11 (Thread 0x7f0084bf5700 (LWP 2908)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0084bf5700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7f0057fff700 (LWP 2909)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0057fff700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f00577fe700 (LWP 2910)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f00577fe700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f0056ffd700 (LWP 2911)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0056ffd700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f00567fc700 (LWP 2912)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f00567fc700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f0055ffb700 (LWP 2913)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0055ffb700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f00557fa700 (LWP 2914)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f00557fa700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f0054ff9700 (LWP 2915)):
#0  0x00007f00a04dbc37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f00a59ea0a9 in ?? ()
#2  0x00007f00a59ea1e5 in ?? ()
#3  0x00007f00a59d59bc in ?? ()
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f0054ff9700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f00841ff700 (LWP 2917)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f00a4282e1e in Wait (mutex=..., this=0x7f00a7f32548) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7f00a7f32360) at msg/Pipe.cc:1610
#3  0x00007f00a428e1dd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007f00a07b5e9a in start_thread (arg=0x7f00841ff700) at pthread_create.c:308
#5  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f00547f8700 (LWP 2918)):
#0  0x00007f00a04d7a13 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f00a4277f0c in Pipe::tcp_read_wait (this=this@entry=0x7f00a7f32360) at msg/Pipe.cc:2144
#2  0x00007f00a42781d0 in Pipe::tcp_read (this=this@entry=0x7f00a7f32360, buf=<optimized out>, buf@entry=0x7f00547f7aef "\377\001", len=len@entry=1) at msg/Pipe.cc:2117
#3  0x00007f00a428a10e in Pipe::reader (this=0x7f00a7f32360) at msg/Pipe.cc:1353
#4  0x00007f00a428e2fd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007f00a07b5e9a in start_thread (arg=0x7f00547f8700) at pthread_create.c:308
#6  0x00007f00a04e33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f00a57dd940 (LWP 2883)):
#0  0x00007f00a04d7ae3 in ppoll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>, sigmask=<optimized out>) at ../sysdeps/unix/sysv/linux/ppoll.c:58
#1  0x00007f00a5c4a679 in ?? ()
#2  0x00007f00a5c49a64 in ?? ()
#3  0x00007f00a59a7665 in main ()

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15 17:32     ` Andrey Korolyov
@ 2014-07-15 17:39       ` Andrey Korolyov
  0 siblings, 0 replies; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-15 17:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Fam Zheng, qemu-devel@nongnu.org

On Tue, Jul 15, 2014 at 9:32 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Tue, Jul 15, 2014 at 7:57 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 13/07/2014 17:29, Andrey Korolyov ha scritto:
>>
>>> Small follow-up: issue have probabilistic nature, as it looks - by
>>> limited number of runs, it is reproducible within three cases:
>>>  1) live migration went well, I/O locked up,
>>>  2) live migration failed by timeout, I/O locked up,
>>>  3) live migration went well and disk was not locked, but on backward
>>> migration we are always hitting 2).
>>
>>
>> Can you provide a gdb backtrace of case (2)?
>>
>> Paolo
>>
>
> Sorry, I`ve mixed reset into. The timeout on migration happens if
> there was already hang before reset. Bt-reset and bt-noreset
> corresponds to two backtraces first of which had hang+reset
> previously. As it looks by bisection, the problem introduced by fix
> targeted to patch exactly the same behavior. Moreover, rolling
> problematic commit back (9a833121eb253589143ff8fe30be8a311a2c16b3)
> fixes situation for me.

Sorry, wrong hash given, right one is the
9b1786829aefb83f37a8f3135e3ea91c56001b56. I am referring to the
http://marc.info/?l=qemu-devel&m=140174116403095&w=2 above. Also I
forgot to mention that two backtraces are absolutely identical, so the
problem is linked to the emulator state for sure.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15 14:01     ` Andrey Korolyov
@ 2014-07-15 21:09       ` Marcelo Tosatti
  2014-07-15 21:25         ` Andrey Korolyov
  2014-07-16  7:35         ` Marcin Gibuła
  0 siblings, 2 replies; 76+ messages in thread
From: Marcelo Tosatti @ 2014-07-15 21:09 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru> wrote:
> > On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com> wrote:
> >> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
> >>> Hello,
> >>>
> >>> the issue is not specific to the iothread code because generic
> >>> virtio-blk also hangs up:
> >>
> >> Do you know which version works well?  If you could bisect, that'll
> >> help a lot.
> >>
> >> Thanks,
> >>                 Amit
> >
> > Hi,
> >
> > 2.0 works definitely well. I`ll try to finish bisection today, though
> > every step takes about 10 minutes to complete.
> 
> Yay! It is even outside of virtio-blk.
> 
> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
> Author: Marcelo Tosatti <mtosatti@redhat.com>
> Date:   Tue Jun 3 13:34:48 2014 -0300
> 
>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation
> 
>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
> 
>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
>     Cc: qemu-stable@nongnu.org
>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Andrey,

Can you please provide instructions on how to create reproducible 
environment? 

The following patch is equivalent to the original patch, for
the purposes of fixing the kvmclock problem.

Perhaps it becomes easier to spot the reason for the hang you are
experiencing.


diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 272a88a..feb5fc5 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -17,7 +17,6 @@
 #include "qemu/host-utils.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
-#include "sysemu/cpus.h"
 #include "hw/sysbus.h"
 #include "hw/kvm/clock.h"
 
@@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
 
     cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
 
-    assert(time.tsc_timestamp <= migration_tsc);
     delta = migration_tsc - time.tsc_timestamp;
     if (time.tsc_shift < 0) {
         delta >>= -time.tsc_shift;
@@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running,
         if (s->clock_valid) {
             return;
         }
-
-        cpu_synchronize_all_states();
         ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
         if (ret < 0) {
             fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
diff --git a/migration.c b/migration.c
index 8d675b3..34f2325 100644
--- a/migration.c
+++ b/migration.c
@@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
                 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
                 old_vm_running = runstate_is_running();
 
+                cpu_synchronize_all_states();
                 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
                 if (ret >= 0) {
                     qemu_file_set_rate_limit(s->file, INT64_MAX);

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15 21:09       ` Marcelo Tosatti
@ 2014-07-15 21:25         ` Andrey Korolyov
  2014-07-15 22:01           ` Paolo Bonzini
  2014-07-16  7:35         ` Marcin Gibuła
  1 sibling, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-15 21:25 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> > On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com> wrote:
>> >> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>> >>> Hello,
>> >>>
>> >>> the issue is not specific to the iothread code because generic
>> >>> virtio-blk also hangs up:
>> >>
>> >> Do you know which version works well?  If you could bisect, that'll
>> >> help a lot.
>> >>
>> >> Thanks,
>> >>                 Amit
>> >
>> > Hi,
>> >
>> > 2.0 works definitely well. I`ll try to finish bisection today, though
>> > every step takes about 10 minutes to complete.
>>
>> Yay! It is even outside of virtio-blk.
>>
>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>> Author: Marcelo Tosatti <mtosatti@redhat.com>
>> Date:   Tue Jun 3 13:34:48 2014 -0300
>>
>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation
>>
>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
>>
>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
>>     Cc: qemu-stable@nongnu.org
>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> Andrey,
>
> Can you please provide instructions on how to create reproducible
> environment?
>
> The following patch is equivalent to the original patch, for
> the purposes of fixing the kvmclock problem.
>
> Perhaps it becomes easier to spot the reason for the hang you are
> experiencing.
>
>
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index 272a88a..feb5fc5 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -17,7 +17,6 @@
>  #include "qemu/host-utils.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/kvm.h"
> -#include "sysemu/cpus.h"
>  #include "hw/sysbus.h"
>  #include "hw/kvm/clock.h"
>
> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>
>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
>
> -    assert(time.tsc_timestamp <= migration_tsc);
>      delta = migration_tsc - time.tsc_timestamp;
>      if (time.tsc_shift < 0) {
>          delta >>= -time.tsc_shift;
> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running,
>          if (s->clock_valid) {
>              return;
>          }
> -
> -        cpu_synchronize_all_states();
>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>          if (ret < 0) {
>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> diff --git a/migration.c b/migration.c
> index 8d675b3..34f2325 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>                  old_vm_running = runstate_is_running();
>
> +                cpu_synchronize_all_states();
>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>                  if (ret >= 0) {
>                      qemu_file_set_rate_limit(s->file, INT64_MAX);

Marcelo, I do not see way easier than creating PoC deployment
(involving at least two separated physical nodes) which will act for
both as a sender and receiver for migration and for Ceph storage
(http://ceph.com/docs/master/start/). For simplicity you probably want
to disable cephx, therefore not putting the secret in the CLI. Also
you may receive minimal qemu-ready installation using Mirantis` Fuel
with Ceph deployment settings (it`ll deploy some Openstack too as a
side effect, but the main reason to do things this way is a very high
level of provisioning automation, you`ll get necessary environment
with multi-node setting with RBD backend in matter of some clicks and
some hours). In a meantime, I`ll try to reproduce the issue with
iscsi, because I do not want to mess with shared storage and sanlock
plugin.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15 21:25         ` Andrey Korolyov
@ 2014-07-15 22:01           ` Paolo Bonzini
  2014-07-15 23:40             ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-07-15 22:01 UTC (permalink / raw)
  To: Andrey Korolyov, Marcelo Tosatti
  Cc: Amit Shah, Fam Zheng, qemu-devel@nongnu.org

Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru> wrote:
>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com> wrote:
>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>>>>>> Hello,
>>>>>>
>>>>>> the issue is not specific to the iothread code because generic
>>>>>> virtio-blk also hangs up:
>>>>>
>>>>> Do you know which version works well?  If you could bisect, that'll
>>>>> help a lot.
>>>>>
>>>>> Thanks,
>>>>>                 Amit
>>>>
>>>> Hi,
>>>>
>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
>>>> every step takes about 10 minutes to complete.
>>>
>>> Yay! It is even outside of virtio-blk.
>>>
>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
>>> Date:   Tue Jun 3 13:34:48 2014 -0300
>>>
>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation
>>>
>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
>>>
>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
>>>     Cc: qemu-stable@nongnu.org
>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> Andrey,
>>
>> Can you please provide instructions on how to create reproducible
>> environment?
>>
>> The following patch is equivalent to the original patch, for
>> the purposes of fixing the kvmclock problem.
>>
>> Perhaps it becomes easier to spot the reason for the hang you are
>> experiencing.
>>
>>
>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>> index 272a88a..feb5fc5 100644
>> --- a/hw/i386/kvm/clock.c
>> +++ b/hw/i386/kvm/clock.c
>> @@ -17,7 +17,6 @@
>>  #include "qemu/host-utils.h"
>>  #include "sysemu/sysemu.h"
>>  #include "sysemu/kvm.h"
>> -#include "sysemu/cpus.h"
>>  #include "hw/sysbus.h"
>>  #include "hw/kvm/clock.h"
>>
>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>>
>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
>>
>> -    assert(time.tsc_timestamp <= migration_tsc);
>>      delta = migration_tsc - time.tsc_timestamp;
>>      if (time.tsc_shift < 0) {
>>          delta >>= -time.tsc_shift;
>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running,
>>          if (s->clock_valid) {
>>              return;
>>          }
>> -
>> -        cpu_synchronize_all_states();
>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>>          if (ret < 0) {
>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
>> diff --git a/migration.c b/migration.c
>> index 8d675b3..34f2325 100644
>> --- a/migration.c
>> +++ b/migration.c
>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>>                  old_vm_running = runstate_is_running();
>>
>> +                cpu_synchronize_all_states();
>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>                  if (ret >= 0) {
>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);

It could also be useful to apply the above patch _and_ revert 
a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15 22:01           ` Paolo Bonzini
@ 2014-07-15 23:40             ` Andrey Korolyov
  2014-07-15 23:47               ` Marcelo Tosatti
  2014-07-16  1:16               ` Marcelo Tosatti
  0 siblings, 2 replies; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-15 23:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
>
>> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
>> wrote:
>>>
>>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>>>>
>>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
>>>> wrote:
>>>>>
>>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
>>>>> wrote:
>>>>>>
>>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> the issue is not specific to the iothread code because generic
>>>>>>> virtio-blk also hangs up:
>>>>>>
>>>>>>
>>>>>> Do you know which version works well?  If you could bisect, that'll
>>>>>> help a lot.
>>>>>>
>>>>>> Thanks,
>>>>>>                 Amit
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
>>>>> every step takes about 10 minutes to complete.
>>>>
>>>>
>>>> Yay! It is even outside of virtio-blk.
>>>>
>>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
>>>> Date:   Tue Jun 3 13:34:48 2014 -0300
>>>>
>>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
>>>> calculation
>>>>
>>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
>>>>
>>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
>>>>     Cc: qemu-stable@nongnu.org
>>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>>
>>> Andrey,
>>>
>>> Can you please provide instructions on how to create reproducible
>>> environment?
>>>
>>> The following patch is equivalent to the original patch, for
>>> the purposes of fixing the kvmclock problem.
>>>
>>> Perhaps it becomes easier to spot the reason for the hang you are
>>> experiencing.
>>>
>>>
>>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>>> index 272a88a..feb5fc5 100644
>>> --- a/hw/i386/kvm/clock.c
>>> +++ b/hw/i386/kvm/clock.c
>>> @@ -17,7 +17,6 @@
>>>  #include "qemu/host-utils.h"
>>>  #include "sysemu/sysemu.h"
>>>  #include "sysemu/kvm.h"
>>> -#include "sysemu/cpus.h"
>>>  #include "hw/sysbus.h"
>>>  #include "hw/kvm/clock.h"
>>>
>>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>>>
>>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
>>>
>>> -    assert(time.tsc_timestamp <= migration_tsc);
>>>      delta = migration_tsc - time.tsc_timestamp;
>>>      if (time.tsc_shift < 0) {
>>>          delta >>= -time.tsc_shift;
>>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
>>> int running,
>>>          if (s->clock_valid) {
>>>              return;
>>>          }
>>> -
>>> -        cpu_synchronize_all_states();
>>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>>>          if (ret < 0) {
>>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
>>> strerror(ret));
>>> diff --git a/migration.c b/migration.c
>>> index 8d675b3..34f2325 100644
>>> --- a/migration.c
>>> +++ b/migration.c
>>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>>>                  old_vm_running = runstate_is_running();
>>>
>>> +                cpu_synchronize_all_states();
>>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>>                  if (ret >= 0) {
>>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
>
>
> It could also be useful to apply the above patch _and_ revert
> a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
>
> Paolo

Yes, it solved the issue for me! (it took much time to check because
most of country` debian mirrors went inconsistent by some reason)

Also trivial addition:

diff --git a/migration.c b/migration.c
index 34f2325..65d1c88 100644
--- a/migration.c
+++ b/migration.c
@@ -25,6 +25,7 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "sysemu/cpus.h"

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15 23:40             ` Andrey Korolyov
@ 2014-07-15 23:47               ` Marcelo Tosatti
  2014-07-16  1:16               ` Marcelo Tosatti
  1 sibling, 0 replies; 76+ messages in thread
From: Marcelo Tosatti @ 2014-07-15 23:47 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
> >
> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
> >> wrote:
> >>>
> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
> >>>>
> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
> >>>> wrote:
> >>>>>
> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> the issue is not specific to the iothread code because generic
> >>>>>>> virtio-blk also hangs up:
> >>>>>>
> >>>>>>
> >>>>>> Do you know which version works well?  If you could bisect, that'll
> >>>>>> help a lot.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>                 Amit
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
> >>>>> every step takes about 10 minutes to complete.
> >>>>
> >>>>
> >>>> Yay! It is even outside of virtio-blk.
> >>>>
> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
> >>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
> >>>>
> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
> >>>> calculation
> >>>>
> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
> >>>>
> >>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
> >>>>     Cc: qemu-stable@nongnu.org
> >>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>
> >>>
> >>> Andrey,
> >>>
> >>> Can you please provide instructions on how to create reproducible
> >>> environment?
> >>>
> >>> The following patch is equivalent to the original patch, for
> >>> the purposes of fixing the kvmclock problem.
> >>>
> >>> Perhaps it becomes easier to spot the reason for the hang you are
> >>> experiencing.
> >>>
> >>>
> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> >>> index 272a88a..feb5fc5 100644
> >>> --- a/hw/i386/kvm/clock.c
> >>> +++ b/hw/i386/kvm/clock.c
> >>> @@ -17,7 +17,6 @@
> >>>  #include "qemu/host-utils.h"
> >>>  #include "sysemu/sysemu.h"
> >>>  #include "sysemu/kvm.h"
> >>> -#include "sysemu/cpus.h"
> >>>  #include "hw/sysbus.h"
> >>>  #include "hw/kvm/clock.h"
> >>>
> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
> >>>
> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
> >>>
> >>> -    assert(time.tsc_timestamp <= migration_tsc);
> >>>      delta = migration_tsc - time.tsc_timestamp;
> >>>      if (time.tsc_shift < 0) {
> >>>          delta >>= -time.tsc_shift;
> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
> >>> int running,
> >>>          if (s->clock_valid) {
> >>>              return;
> >>>          }
> >>> -
> >>> -        cpu_synchronize_all_states();
> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> >>>          if (ret < 0) {
> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
> >>> strerror(ret));
> >>> diff --git a/migration.c b/migration.c
> >>> index 8d675b3..34f2325 100644
> >>> --- a/migration.c
> >>> +++ b/migration.c
> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
> >>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> >>>                  old_vm_running = runstate_is_running();
> >>>
> >>> +                cpu_synchronize_all_states();
> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> >>>                  if (ret >= 0) {
> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
> >
> >
> > It could also be useful to apply the above patch _and_ revert
> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
> >
> > Paolo
> 
> Yes, it solved the issue for me! (it took much time to check because
> most of country` debian mirrors went inconsistent by some reason)
> 
> Also trivial addition:
> 
> diff --git a/migration.c b/migration.c
> index 34f2325..65d1c88 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -25,6 +25,7 @@
>  #include "qemu/thread.h"
>  #include "qmp-commands.h"
>  #include "trace.h"
> +#include "sysemu/cpus.h"

Can you attach 'git diff' of the qemu tree where the test
was successful please?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15 23:40             ` Andrey Korolyov
  2014-07-15 23:47               ` Marcelo Tosatti
@ 2014-07-16  1:16               ` Marcelo Tosatti
  2014-07-16  8:38                 ` Andrey Korolyov
  1 sibling, 1 reply; 76+ messages in thread
From: Marcelo Tosatti @ 2014-07-16  1:16 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
> >
> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
> >> wrote:
> >>>
> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
> >>>>
> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
> >>>> wrote:
> >>>>>
> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> the issue is not specific to the iothread code because generic
> >>>>>>> virtio-blk also hangs up:
> >>>>>>
> >>>>>>
> >>>>>> Do you know which version works well?  If you could bisect, that'll
> >>>>>> help a lot.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>                 Amit
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
> >>>>> every step takes about 10 minutes to complete.
> >>>>
> >>>>
> >>>> Yay! It is even outside of virtio-blk.
> >>>>
> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
> >>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
> >>>>
> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
> >>>> calculation
> >>>>
> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
> >>>>
> >>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
> >>>>     Cc: qemu-stable@nongnu.org
> >>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>
> >>>
> >>> Andrey,
> >>>
> >>> Can you please provide instructions on how to create reproducible
> >>> environment?
> >>>
> >>> The following patch is equivalent to the original patch, for
> >>> the purposes of fixing the kvmclock problem.
> >>>
> >>> Perhaps it becomes easier to spot the reason for the hang you are
> >>> experiencing.
> >>>
> >>>
> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> >>> index 272a88a..feb5fc5 100644
> >>> --- a/hw/i386/kvm/clock.c
> >>> +++ b/hw/i386/kvm/clock.c
> >>> @@ -17,7 +17,6 @@
> >>>  #include "qemu/host-utils.h"
> >>>  #include "sysemu/sysemu.h"
> >>>  #include "sysemu/kvm.h"
> >>> -#include "sysemu/cpus.h"
> >>>  #include "hw/sysbus.h"
> >>>  #include "hw/kvm/clock.h"
> >>>
> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
> >>>
> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
> >>>
> >>> -    assert(time.tsc_timestamp <= migration_tsc);
> >>>      delta = migration_tsc - time.tsc_timestamp;
> >>>      if (time.tsc_shift < 0) {
> >>>          delta >>= -time.tsc_shift;
> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
> >>> int running,
> >>>          if (s->clock_valid) {
> >>>              return;
> >>>          }
> >>> -
> >>> -        cpu_synchronize_all_states();
> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> >>>          if (ret < 0) {
> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
> >>> strerror(ret));
> >>> diff --git a/migration.c b/migration.c
> >>> index 8d675b3..34f2325 100644
> >>> --- a/migration.c
> >>> +++ b/migration.c
> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
> >>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> >>>                  old_vm_running = runstate_is_running();
> >>>
> >>> +                cpu_synchronize_all_states();
> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> >>>                  if (ret >= 0) {
> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
> >
> >
> > It could also be useful to apply the above patch _and_ revert
> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
> >
> > Paolo
> 
> Yes, it solved the issue for me! (it took much time to check because
> most of country` debian mirrors went inconsistent by some reason)
> 
> Also trivial addition:
> 
> diff --git a/migration.c b/migration.c
> index 34f2325..65d1c88 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -25,6 +25,7 @@
>  #include "qemu/thread.h"
>  #include "qmp-commands.h"
>  #include "trace.h"
> +#include "sysemu/cpus.h"

And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?

That is, test with a stock qemu.git tree and the patch sent today, 
on this thread, to move cpu_synchronize_all_states ?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-15 21:09       ` Marcelo Tosatti
  2014-07-15 21:25         ` Andrey Korolyov
@ 2014-07-16  7:35         ` Marcin Gibuła
  2014-07-16 12:00           ` Marcelo Tosatti
  1 sibling, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-16  7:35 UTC (permalink / raw)
  To: Marcelo Tosatti, Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

> Andrey,
>
> Can you please provide instructions on how to create reproducible
> environment?
>
> The following patch is equivalent to the original patch, for
> the purposes of fixing the kvmclock problem.
>
> Perhaps it becomes easier to spot the reason for the hang you are
> experiencing.

Marcelo,

the original reason for patch adding cpu_synchronize_all_states() there 
was because this bug affected non-migration operations as well - 
http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html.

Won't moving it only to migration code break these things again?

>
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index 272a88a..feb5fc5 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -17,7 +17,6 @@
>   #include "qemu/host-utils.h"
>   #include "sysemu/sysemu.h"
>   #include "sysemu/kvm.h"
> -#include "sysemu/cpus.h"
>   #include "hw/sysbus.h"
>   #include "hw/kvm/clock.h"
>
> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>
>       cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
>
> -    assert(time.tsc_timestamp <= migration_tsc);
>       delta = migration_tsc - time.tsc_timestamp;
>       if (time.tsc_shift < 0) {
>           delta >>= -time.tsc_shift;
> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running,
>           if (s->clock_valid) {
>               return;
>           }
> -
> -        cpu_synchronize_all_states();
>           ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>           if (ret < 0) {
>               fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> diff --git a/migration.c b/migration.c
> index 8d675b3..34f2325 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>                   qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>                   old_vm_running = runstate_is_running();
>
> +                cpu_synchronize_all_states();
>                   ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>                   if (ret >= 0) {
>                       qemu_file_set_rate_limit(s->file, INT64_MAX);
>


-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-16  1:16               ` Marcelo Tosatti
@ 2014-07-16  8:38                 ` Andrey Korolyov
  2014-07-16 11:52                   ` Marcelo Tosatti
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-16  8:38 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 5507 bytes --]

On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
>> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
>> >
>> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
>> >> wrote:
>> >>>
>> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>> >>>>
>> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
>> >>>> wrote:
>> >>>>>
>> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>> >>>>>>>
>> >>>>>>> Hello,
>> >>>>>>>
>> >>>>>>> the issue is not specific to the iothread code because generic
>> >>>>>>> virtio-blk also hangs up:
>> >>>>>>
>> >>>>>>
>> >>>>>> Do you know which version works well?  If you could bisect, that'll
>> >>>>>> help a lot.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>>                 Amit
>> >>>>>
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
>> >>>>> every step takes about 10 minutes to complete.
>> >>>>
>> >>>>
>> >>>> Yay! It is even outside of virtio-blk.
>> >>>>
>> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>> >>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
>> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
>> >>>>
>> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
>> >>>> calculation
>> >>>>
>> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
>> >>>>
>> >>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
>> >>>>     Cc: qemu-stable@nongnu.org
>> >>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>> >>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> >>>
>> >>>
>> >>> Andrey,
>> >>>
>> >>> Can you please provide instructions on how to create reproducible
>> >>> environment?
>> >>>
>> >>> The following patch is equivalent to the original patch, for
>> >>> the purposes of fixing the kvmclock problem.
>> >>>
>> >>> Perhaps it becomes easier to spot the reason for the hang you are
>> >>> experiencing.
>> >>>
>> >>>
>> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>> >>> index 272a88a..feb5fc5 100644
>> >>> --- a/hw/i386/kvm/clock.c
>> >>> +++ b/hw/i386/kvm/clock.c
>> >>> @@ -17,7 +17,6 @@
>> >>>  #include "qemu/host-utils.h"
>> >>>  #include "sysemu/sysemu.h"
>> >>>  #include "sysemu/kvm.h"
>> >>> -#include "sysemu/cpus.h"
>> >>>  #include "hw/sysbus.h"
>> >>>  #include "hw/kvm/clock.h"
>> >>>
>> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>> >>>
>> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
>> >>>
>> >>> -    assert(time.tsc_timestamp <= migration_tsc);
>> >>>      delta = migration_tsc - time.tsc_timestamp;
>> >>>      if (time.tsc_shift < 0) {
>> >>>          delta >>= -time.tsc_shift;
>> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
>> >>> int running,
>> >>>          if (s->clock_valid) {
>> >>>              return;
>> >>>          }
>> >>> -
>> >>> -        cpu_synchronize_all_states();
>> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>> >>>          if (ret < 0) {
>> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
>> >>> strerror(ret));
>> >>> diff --git a/migration.c b/migration.c
>> >>> index 8d675b3..34f2325 100644
>> >>> --- a/migration.c
>> >>> +++ b/migration.c
>> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>> >>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>> >>>                  old_vm_running = runstate_is_running();
>> >>>
>> >>> +                cpu_synchronize_all_states();
>> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>> >>>                  if (ret >= 0) {
>> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
>> >
>> >
>> > It could also be useful to apply the above patch _and_ revert
>> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
>> >
>> > Paolo
>>
>> Yes, it solved the issue for me! (it took much time to check because
>> most of country` debian mirrors went inconsistent by some reason)
>>
>> Also trivial addition:
>>
>> diff --git a/migration.c b/migration.c
>> index 34f2325..65d1c88 100644
>> --- a/migration.c
>> +++ b/migration.c
>> @@ -25,6 +25,7 @@
>>  #include "qemu/thread.h"
>>  #include "qmp-commands.h"
>>  #include "trace.h"
>> +#include "sysemu/cpus.h"
>
> And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
>
> That is, test with a stock qemu.git tree and the patch sent today,
> on this thread, to move cpu_synchronize_all_states ?
>
>

The main reason for things to work for me is a revert of
9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
patches. I had tested two cases, with Alexander`s patch completely
reverted plus suggestion from Marcelo and only with revert 9b178682
plug same suggestion. The difference is that the until Alexander`
patch is not reverted, live migration is always failing by the timeout
value, and when reverted migration always succeeds in 8-10 seconds.
Appropriate diffs are attached for the reference.

[-- Attachment #2: diff-with-reverted-agraf-patch.txt --]
[-- Type: text/plain, Size: 3271 bytes --]

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 272a88a..93e1829 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -17,7 +17,6 @@
 #include "qemu/host-utils.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
-#include "sysemu/cpus.h"
 #include "hw/sysbus.h"
 #include "hw/kvm/clock.h"
 
@@ -36,49 +35,6 @@ typedef struct KVMClockState {
     bool clock_valid;
 } KVMClockState;
 
-struct pvclock_vcpu_time_info {
-    uint32_t   version;
-    uint32_t   pad0;
-    uint64_t   tsc_timestamp;
-    uint64_t   system_time;
-    uint32_t   tsc_to_system_mul;
-    int8_t     tsc_shift;
-    uint8_t    flags;
-    uint8_t    pad[2];
-} __attribute__((__packed__)); /* 32 bytes */
-
-static uint64_t kvmclock_current_nsec(KVMClockState *s)
-{
-    CPUState *cpu = first_cpu;
-    CPUX86State *env = cpu->env_ptr;
-    hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
-    uint64_t migration_tsc = env->tsc;
-    struct pvclock_vcpu_time_info time;
-    uint64_t delta;
-    uint64_t nsec_lo;
-    uint64_t nsec_hi;
-    uint64_t nsec;
-
-    if (!(env->system_time_msr & 1ULL)) {
-        /* KVM clock not active */
-        return 0;
-    }
-
-    cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
-
-    assert(time.tsc_timestamp <= migration_tsc);
-    delta = migration_tsc - time.tsc_timestamp;
-    if (time.tsc_shift < 0) {
-        delta >>= -time.tsc_shift;
-    } else {
-        delta <<= time.tsc_shift;
-    }
-
-    mulu64(&nsec_lo, &nsec_hi, delta, time.tsc_to_system_mul);
-    nsec = (nsec_lo >> 32) | (nsec_hi << 32);
-    return nsec + time.system_time;
-}
-
 static void kvmclock_vm_state_change(void *opaque, int running,
                                      RunState state)
 {
@@ -89,15 +45,9 @@ static void kvmclock_vm_state_change(void *opaque, int running,
 
     if (running) {
         struct kvm_clock_data data;
-        uint64_t time_at_migration = kvmclock_current_nsec(s);
 
         s->clock_valid = false;
 
-	/* We can't rely on the migrated clock value, just discard it */
-	if (time_at_migration) {
-	        s->clock = time_at_migration;
-	}
-
         data.clock = s->clock;
         data.flags = 0;
         ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
@@ -125,8 +75,6 @@ static void kvmclock_vm_state_change(void *opaque, int running,
         if (s->clock_valid) {
             return;
         }
-
-        cpu_synchronize_all_states();
         ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
         if (ret < 0) {
             fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
diff --git a/migration.c b/migration.c
index 8d675b3..65d1c88 100644
--- a/migration.c
+++ b/migration.c
@@ -25,6 +25,7 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "sysemu/cpus.h"
 
 enum {
     MIG_STATE_ERROR = -1,
@@ -608,6 +609,7 @@ static void *migration_thread(void *opaque)
                 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
                 old_vm_running = runstate_is_running();
 
+                cpu_synchronize_all_states();
                 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
                 if (ret >= 0) {
                     qemu_file_set_rate_limit(s->file, INT64_MAX);

[-- Attachment #3: diff-with-only-late-fix-moved.txt --]
[-- Type: text/plain, Size: 1652 bytes --]

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 272a88a..feb5fc5 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -17,7 +17,6 @@
 #include "qemu/host-utils.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
-#include "sysemu/cpus.h"
 #include "hw/sysbus.h"
 #include "hw/kvm/clock.h"
 
@@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
 
     cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
 
-    assert(time.tsc_timestamp <= migration_tsc);
     delta = migration_tsc - time.tsc_timestamp;
     if (time.tsc_shift < 0) {
         delta >>= -time.tsc_shift;
@@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running,
         if (s->clock_valid) {
             return;
         }
-
-        cpu_synchronize_all_states();
         ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
         if (ret < 0) {
             fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
diff --git a/migration.c b/migration.c
index 8d675b3..65d1c88 100644
--- a/migration.c
+++ b/migration.c
@@ -25,6 +25,7 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "sysemu/cpus.h"
 
 enum {
     MIG_STATE_ERROR = -1,
@@ -608,6 +609,7 @@ static void *migration_thread(void *opaque)
                 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
                 old_vm_running = runstate_is_running();
 
+                cpu_synchronize_all_states();
                 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
                 if (ret >= 0) {
                     qemu_file_set_rate_limit(s->file, INT64_MAX);

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-16  8:38                 ` Andrey Korolyov
@ 2014-07-16 11:52                   ` Marcelo Tosatti
  2014-07-16 13:24                     ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Marcelo Tosatti @ 2014-07-16 11:52 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 6252 bytes --]

On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
> On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
> >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
> >> >
> >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
> >> >> wrote:
> >> >>>
> >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
> >> >>>>
> >> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
> >> >>>>> wrote:
> >> >>>>>>
> >> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
> >> >>>>>>>
> >> >>>>>>> Hello,
> >> >>>>>>>
> >> >>>>>>> the issue is not specific to the iothread code because generic
> >> >>>>>>> virtio-blk also hangs up:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Do you know which version works well?  If you could bisect, that'll
> >> >>>>>> help a lot.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>>                 Amit
> >> >>>>>
> >> >>>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
> >> >>>>> every step takes about 10 minutes to complete.
> >> >>>>
> >> >>>>
> >> >>>> Yay! It is even outside of virtio-blk.
> >> >>>>
> >> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
> >> >>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
> >> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
> >> >>>>
> >> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
> >> >>>> calculation
> >> >>>>
> >> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
> >> >>>>
> >> >>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
> >> >>>>     Cc: qemu-stable@nongnu.org
> >> >>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >> >>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> >>>
> >> >>>
> >> >>> Andrey,
> >> >>>
> >> >>> Can you please provide instructions on how to create reproducible
> >> >>> environment?
> >> >>>
> >> >>> The following patch is equivalent to the original patch, for
> >> >>> the purposes of fixing the kvmclock problem.
> >> >>>
> >> >>> Perhaps it becomes easier to spot the reason for the hang you are
> >> >>> experiencing.
> >> >>>
> >> >>>
> >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> >> >>> index 272a88a..feb5fc5 100644
> >> >>> --- a/hw/i386/kvm/clock.c
> >> >>> +++ b/hw/i386/kvm/clock.c
> >> >>> @@ -17,7 +17,6 @@
> >> >>>  #include "qemu/host-utils.h"
> >> >>>  #include "sysemu/sysemu.h"
> >> >>>  #include "sysemu/kvm.h"
> >> >>> -#include "sysemu/cpus.h"
> >> >>>  #include "hw/sysbus.h"
> >> >>>  #include "hw/kvm/clock.h"
> >> >>>
> >> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
> >> >>>
> >> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
> >> >>>
> >> >>> -    assert(time.tsc_timestamp <= migration_tsc);
> >> >>>      delta = migration_tsc - time.tsc_timestamp;
> >> >>>      if (time.tsc_shift < 0) {
> >> >>>          delta >>= -time.tsc_shift;
> >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
> >> >>> int running,
> >> >>>          if (s->clock_valid) {
> >> >>>              return;
> >> >>>          }
> >> >>> -
> >> >>> -        cpu_synchronize_all_states();
> >> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> >> >>>          if (ret < 0) {
> >> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
> >> >>> strerror(ret));
> >> >>> diff --git a/migration.c b/migration.c
> >> >>> index 8d675b3..34f2325 100644
> >> >>> --- a/migration.c
> >> >>> +++ b/migration.c
> >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
> >> >>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> >> >>>                  old_vm_running = runstate_is_running();
> >> >>>
> >> >>> +                cpu_synchronize_all_states();
> >> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> >> >>>                  if (ret >= 0) {
> >> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
> >> >
> >> >
> >> > It could also be useful to apply the above patch _and_ revert
> >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
> >> >
> >> > Paolo
> >>
> >> Yes, it solved the issue for me! (it took much time to check because
> >> most of country` debian mirrors went inconsistent by some reason)
> >>
> >> Also trivial addition:
> >>
> >> diff --git a/migration.c b/migration.c
> >> index 34f2325..65d1c88 100644
> >> --- a/migration.c
> >> +++ b/migration.c
> >> @@ -25,6 +25,7 @@
> >>  #include "qemu/thread.h"
> >>  #include "qmp-commands.h"
> >>  #include "trace.h"
> >> +#include "sysemu/cpus.h"
> >
> > And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
> >
> > That is, test with a stock qemu.git tree and the patch sent today,
> > on this thread, to move cpu_synchronize_all_states ?
> >
> >
> 
> The main reason for things to work for me is a revert of
> 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
> patches. I had tested two cases, with Alexander`s patch completely
> reverted plus suggestion from Marcelo and only with revert 9b178682
> plug same suggestion. The difference is that the until Alexander`
> patch is not reverted, live migration is always failing by the timeout
> value, and when reverted migration always succeeds in 8-10 seconds.
> Appropriate diffs are attached for the reference.

Andrey,

Can you please apply only the following attached patch to an upstream
QEMU git tree (move_synchronize_all_states.patch), plus the necessary
header file corrections, and attempt to reproduce?

When you reproduce, please provide a backtrace and version of the QEMU
git tree, and instructions on how to reproduce:

1) full QEMU command line
2) steps to reproduce
 


[-- Attachment #2: move_synchronize_all_states.patch --]
[-- Type: text/plain, Size: 1549 bytes --]


Move cpu_synchronize_all_states to migration.c.

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 272a88a..feb5fc5 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -17,7 +17,6 @@
 #include "qemu/host-utils.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
-#include "sysemu/cpus.h"
 #include "hw/sysbus.h"
 #include "hw/kvm/clock.h"
 
@@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
 
     cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
 
-    assert(time.tsc_timestamp <= migration_tsc);
     delta = migration_tsc - time.tsc_timestamp;
     if (time.tsc_shift < 0) {
         delta >>= -time.tsc_shift;
@@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running,
         if (s->clock_valid) {
             return;
         }
-
-        cpu_synchronize_all_states();
         ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
         if (ret < 0) {
             fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
diff --git a/migration.c b/migration.c
index 8d675b3..34f2325 100644
--- a/migration.c
+++ b/migration.c
@@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
                 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
                 old_vm_running = runstate_is_running();
 
+                cpu_synchronize_all_states();
                 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
                 if (ret >= 0) {
                     qemu_file_set_rate_limit(s->file, INT64_MAX);


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-16  7:35         ` Marcin Gibuła
@ 2014-07-16 12:00           ` Marcelo Tosatti
  0 siblings, 0 replies; 76+ messages in thread
From: Marcelo Tosatti @ 2014-07-16 12:00 UTC (permalink / raw)
  To: Marcin Gibuła
  Cc: Amit Shah, Paolo Bonzini, Andrey Korolyov, Fam Zheng,
	qemu-devel@nongnu.org

On Wed, Jul 16, 2014 at 09:35:16AM +0200, Marcin Gibuła wrote:
> >Andrey,
> >
> >Can you please provide instructions on how to create reproducible
> >environment?
> >
> >The following patch is equivalent to the original patch, for
> >the purposes of fixing the kvmclock problem.
> >
> >Perhaps it becomes easier to spot the reason for the hang you are
> >experiencing.
> 
> Marcelo,
> 
> the original reason for patch adding cpu_synchronize_all_states()
> there was because this bug affected non-migration operations as well
> -
> http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html.
> 
> Won't moving it only to migration code break these things again?

Yes - its just for debug purposes.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-16 11:52                   ` Marcelo Tosatti
@ 2014-07-16 13:24                     ` Andrey Korolyov
  2014-07-16 18:25                       ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-16 13:24 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
>> On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
>> >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
>> >> >
>> >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>> >> >>>>
>> >> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
>> >> >>>>> wrote:
>> >> >>>>>>
>> >> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>> >> >>>>>>>
>> >> >>>>>>> Hello,
>> >> >>>>>>>
>> >> >>>>>>> the issue is not specific to the iothread code because generic
>> >> >>>>>>> virtio-blk also hangs up:
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> Do you know which version works well?  If you could bisect, that'll
>> >> >>>>>> help a lot.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>>                 Amit
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
>> >> >>>>> every step takes about 10 minutes to complete.
>> >> >>>>
>> >> >>>>
>> >> >>>> Yay! It is even outside of virtio-blk.
>> >> >>>>
>> >> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>> >> >>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
>> >> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
>> >> >>>>
>> >> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
>> >> >>>> calculation
>> >> >>>>
>> >> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
>> >> >>>>
>> >> >>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
>> >> >>>>     Cc: qemu-stable@nongnu.org
>> >> >>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>> >> >>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> >> >>>
>> >> >>>
>> >> >>> Andrey,
>> >> >>>
>> >> >>> Can you please provide instructions on how to create reproducible
>> >> >>> environment?
>> >> >>>
>> >> >>> The following patch is equivalent to the original patch, for
>> >> >>> the purposes of fixing the kvmclock problem.
>> >> >>>
>> >> >>> Perhaps it becomes easier to spot the reason for the hang you are
>> >> >>> experiencing.
>> >> >>>
>> >> >>>
>> >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>> >> >>> index 272a88a..feb5fc5 100644
>> >> >>> --- a/hw/i386/kvm/clock.c
>> >> >>> +++ b/hw/i386/kvm/clock.c
>> >> >>> @@ -17,7 +17,6 @@
>> >> >>>  #include "qemu/host-utils.h"
>> >> >>>  #include "sysemu/sysemu.h"
>> >> >>>  #include "sysemu/kvm.h"
>> >> >>> -#include "sysemu/cpus.h"
>> >> >>>  #include "hw/sysbus.h"
>> >> >>>  #include "hw/kvm/clock.h"
>> >> >>>
>> >> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>> >> >>>
>> >> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
>> >> >>>
>> >> >>> -    assert(time.tsc_timestamp <= migration_tsc);
>> >> >>>      delta = migration_tsc - time.tsc_timestamp;
>> >> >>>      if (time.tsc_shift < 0) {
>> >> >>>          delta >>= -time.tsc_shift;
>> >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
>> >> >>> int running,
>> >> >>>          if (s->clock_valid) {
>> >> >>>              return;
>> >> >>>          }
>> >> >>> -
>> >> >>> -        cpu_synchronize_all_states();
>> >> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>> >> >>>          if (ret < 0) {
>> >> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
>> >> >>> strerror(ret));
>> >> >>> diff --git a/migration.c b/migration.c
>> >> >>> index 8d675b3..34f2325 100644
>> >> >>> --- a/migration.c
>> >> >>> +++ b/migration.c
>> >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>> >> >>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>> >> >>>                  old_vm_running = runstate_is_running();
>> >> >>>
>> >> >>> +                cpu_synchronize_all_states();
>> >> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>> >> >>>                  if (ret >= 0) {
>> >> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
>> >> >
>> >> >
>> >> > It could also be useful to apply the above patch _and_ revert
>> >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
>> >> >
>> >> > Paolo
>> >>
>> >> Yes, it solved the issue for me! (it took much time to check because
>> >> most of country` debian mirrors went inconsistent by some reason)
>> >>
>> >> Also trivial addition:
>> >>
>> >> diff --git a/migration.c b/migration.c
>> >> index 34f2325..65d1c88 100644
>> >> --- a/migration.c
>> >> +++ b/migration.c
>> >> @@ -25,6 +25,7 @@
>> >>  #include "qemu/thread.h"
>> >>  #include "qmp-commands.h"
>> >>  #include "trace.h"
>> >> +#include "sysemu/cpus.h"
>> >
>> > And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
>> >
>> > That is, test with a stock qemu.git tree and the patch sent today,
>> > on this thread, to move cpu_synchronize_all_states ?
>> >
>> >
>>
>> The main reason for things to work for me is a revert of
>> 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
>> patches. I had tested two cases, with Alexander`s patch completely
>> reverted plus suggestion from Marcelo and only with revert 9b178682
>> plug same suggestion. The difference is that the until Alexander`
>> patch is not reverted, live migration is always failing by the timeout
>> value, and when reverted migration always succeeds in 8-10 seconds.
>> Appropriate diffs are attached for the reference.
>
> Andrey,
>
> Can you please apply only the following attached patch to an upstream
> QEMU git tree (move_synchronize_all_states.patch), plus the necessary
> header file corrections, and attempt to reproduce?
>
> When you reproduce, please provide a backtrace and version of the QEMU
> git tree, and instructions on how to reproduce:
>
> 1) full QEMU command line
> 2) steps to reproduce
>
>

Marcelo, as I can see, this patch resembles second case from my
previous message exactly (there are diffs from the generic rc). I/O is
not locking up there but live migration failing and libvirt moves a
freezed state. I can try to run the same on top of rc2, but it`ll be
probably the same.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-16 13:24                     ` Andrey Korolyov
@ 2014-07-16 18:25                       ` Andrey Korolyov
  2014-07-16 21:28                         ` Marcin Gibuła
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-16 18:25 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Marcin Gibuła, Fam Zheng,
	qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 7899 bytes --]

On Wed, Jul 16, 2014 at 5:24 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
>>> On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>> > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
>>> >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
>>> >> >
>>> >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>>> >> >>>>
>>> >> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
>>> >> >>>> wrote:
>>> >> >>>>>
>>> >> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
>>> >> >>>>> wrote:
>>> >> >>>>>>
>>> >> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>>> >> >>>>>>>
>>> >> >>>>>>> Hello,
>>> >> >>>>>>>
>>> >> >>>>>>> the issue is not specific to the iothread code because generic
>>> >> >>>>>>> virtio-blk also hangs up:
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> Do you know which version works well?  If you could bisect, that'll
>>> >> >>>>>> help a lot.
>>> >> >>>>>>
>>> >> >>>>>> Thanks,
>>> >> >>>>>>                 Amit
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> Hi,
>>> >> >>>>>
>>> >> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
>>> >> >>>>> every step takes about 10 minutes to complete.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> Yay! It is even outside of virtio-blk.
>>> >> >>>>
>>> >> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>>> >> >>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
>>> >> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
>>> >> >>>>
>>> >> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
>>> >> >>>> calculation
>>> >> >>>>
>>> >> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
>>> >> >>>>
>>> >> >>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
>>> >> >>>>     Cc: qemu-stable@nongnu.org
>>> >> >>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>> >> >>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> >> >>>
>>> >> >>>
>>> >> >>> Andrey,
>>> >> >>>
>>> >> >>> Can you please provide instructions on how to create reproducible
>>> >> >>> environment?
>>> >> >>>
>>> >> >>> The following patch is equivalent to the original patch, for
>>> >> >>> the purposes of fixing the kvmclock problem.
>>> >> >>>
>>> >> >>> Perhaps it becomes easier to spot the reason for the hang you are
>>> >> >>> experiencing.
>>> >> >>>
>>> >> >>>
>>> >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>>> >> >>> index 272a88a..feb5fc5 100644
>>> >> >>> --- a/hw/i386/kvm/clock.c
>>> >> >>> +++ b/hw/i386/kvm/clock.c
>>> >> >>> @@ -17,7 +17,6 @@
>>> >> >>>  #include "qemu/host-utils.h"
>>> >> >>>  #include "sysemu/sysemu.h"
>>> >> >>>  #include "sysemu/kvm.h"
>>> >> >>> -#include "sysemu/cpus.h"
>>> >> >>>  #include "hw/sysbus.h"
>>> >> >>>  #include "hw/kvm/clock.h"
>>> >> >>>
>>> >> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>>> >> >>>
>>> >> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
>>> >> >>>
>>> >> >>> -    assert(time.tsc_timestamp <= migration_tsc);
>>> >> >>>      delta = migration_tsc - time.tsc_timestamp;
>>> >> >>>      if (time.tsc_shift < 0) {
>>> >> >>>          delta >>= -time.tsc_shift;
>>> >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
>>> >> >>> int running,
>>> >> >>>          if (s->clock_valid) {
>>> >> >>>              return;
>>> >> >>>          }
>>> >> >>> -
>>> >> >>> -        cpu_synchronize_all_states();
>>> >> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>>> >> >>>          if (ret < 0) {
>>> >> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
>>> >> >>> strerror(ret));
>>> >> >>> diff --git a/migration.c b/migration.c
>>> >> >>> index 8d675b3..34f2325 100644
>>> >> >>> --- a/migration.c
>>> >> >>> +++ b/migration.c
>>> >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>>> >> >>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>>> >> >>>                  old_vm_running = runstate_is_running();
>>> >> >>>
>>> >> >>> +                cpu_synchronize_all_states();
>>> >> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>> >> >>>                  if (ret >= 0) {
>>> >> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
>>> >> >
>>> >> >
>>> >> > It could also be useful to apply the above patch _and_ revert
>>> >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
>>> >> >
>>> >> > Paolo
>>> >>
>>> >> Yes, it solved the issue for me! (it took much time to check because
>>> >> most of country` debian mirrors went inconsistent by some reason)
>>> >>
>>> >> Also trivial addition:
>>> >>
>>> >> diff --git a/migration.c b/migration.c
>>> >> index 34f2325..65d1c88 100644
>>> >> --- a/migration.c
>>> >> +++ b/migration.c
>>> >> @@ -25,6 +25,7 @@
>>> >>  #include "qemu/thread.h"
>>> >>  #include "qmp-commands.h"
>>> >>  #include "trace.h"
>>> >> +#include "sysemu/cpus.h"
>>> >
>>> > And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
>>> >
>>> > That is, test with a stock qemu.git tree and the patch sent today,
>>> > on this thread, to move cpu_synchronize_all_states ?
>>> >
>>> >
>>>
>>> The main reason for things to work for me is a revert of
>>> 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
>>> patches. I had tested two cases, with Alexander`s patch completely
>>> reverted plus suggestion from Marcelo and only with revert 9b178682
>>> plug same suggestion. The difference is that the until Alexander`
>>> patch is not reverted, live migration is always failing by the timeout
>>> value, and when reverted migration always succeeds in 8-10 seconds.
>>> Appropriate diffs are attached for the reference.
>>
>> Andrey,
>>
>> Can you please apply only the following attached patch to an upstream
>> QEMU git tree (move_synchronize_all_states.patch), plus the necessary
>> header file corrections, and attempt to reproduce?
>>
>> When you reproduce, please provide a backtrace and version of the QEMU
>> git tree, and instructions on how to reproduce:
>>
>> 1) full QEMU command line
>> 2) steps to reproduce
>>
>>
>
> Marcelo, as I can see, this patch resembles second case from my
> previous message exactly (there are diffs from the generic rc). I/O is
> not locking up there but live migration failing and libvirt moves a
> freezed state. I can try to run the same on top of rc2, but it`ll be
> probably the same.


Tested on iscsi pool, though there are no-cache requirement and rbd
with disabled cache may survive one migration but iscsi backend hangs
always. As it was before, just rolling back problematic commit fixes
the problem and adding cpu_synchronize_all_states to migration.c has
no difference at a glance in a VM` behavior. The problem consist at
least two separate ones: the current hang and behavior with the
unreverted patch from agraf - last one causes live migration with
writeback cache to fail, cache=none works well in any variant which
survives first condition. Marcin, would you mind to check the current
state of the problem on your environments in a spare time? It is
probably easier to reproduce on iscsi because of way smaller time
needed to set it up, command line and libvirt config attached
(v2.1.0-rc2 plus iscsi-1.11.0).

[-- Attachment #2: cli.txt --]
[-- Type: text/plain, Size: 1426 bytes --]

qemu-system-x86_64 -enable-kvm -name vm27842 -S -machine pc-i440fx-2.1,accel=kvm,usb=off -m 256 -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -numa node,nodeid=0,cpus=0,mem=256 -uuid 9a44bdae-5702-463b-aa1e-d8d85055f6af -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm27842.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/dev/disk/by-path/ip-10.6.0.1:3260-iscsi-iqn.2014-05.ru.flops:kvmguest-lun-1,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:10:03:40,bus=pci.0,addr=0x3 -netdev tap,fd=25,id=hostnet1,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:10:03:41,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/vm27842.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1

[-- Attachment #3: libvirt-config.xml --]
[-- Type: text/xml, Size: 2881 bytes --]

<domain type='kvm'>
  <name>vm27842</name>
  <uuid>9a44bdae-5702-463b-aa1e-d8d85055f6af</uuid>
  <description>3145728</description>
  <memory unit='KiB'>262144</memory>
  <currentMemory unit='KiB'>262144</currentMemory>
  <vcpu placement='static' cpuset='0-11'>12</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.1'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu>
    <topology sockets='1' cores='12' threads='12'/>
    <numa>
      <cell cpus='0' memory='262144'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup' track='guest'>
      <catchup threshold='123' slew='120' limit='10000'/>
    </timer>
    <timer name='pit' tickpolicy='delay'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/disk/by-path/ip-10.6.0.1:3260-iscsi-iqn.2014-05.ru.flops:kvmguest-lun-1'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </disk>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <interface type='bridge'>
      <mac address='52:54:00:10:03:40'/>
      <source bridge='oswbr0'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='0bf5d1b1-a972-40e4-839d-067730f4f20d'/>
      </virtualport>
      <target dev='vp27842l'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:10:03:41'/>
      <source bridge='oswbr0'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='f7ebddab-6f4d-4e01-8a5a-59ee3e434c92'/>
      </virtualport>
      <target dev='vp27842p'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/vm27842.sock'/>
      <target type='virtio' name='org.qemu.guest_agent.1'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <memballoon model='none'/>
  </devices>
  <seclabel type='none'/>
</domain>


[-- Attachment #4: fio.txt --]
[-- Type: text/plain, Size: 148 bytes --]

[random]
rw=randrw
size=768M
directory=/data
iodepth=32
fsync=1
direct=1
blocksize=2M
numjobs=1
nrfiles=4
group_reporting
ioengine=libaio
loops=512

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-16 18:25                       ` Andrey Korolyov
@ 2014-07-16 21:28                         ` Marcin Gibuła
  2014-07-16 21:36                           ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-16 21:28 UTC (permalink / raw)
  To: Andrey Korolyov, Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, qemu-devel@nongnu.org

> Tested on iscsi pool, though there are no-cache requirement and rbd
> with disabled cache may survive one migration but iscsi backend hangs
> always. As it was before, just rolling back problematic commit fixes
> the problem and adding cpu_synchronize_all_states to migration.c has
> no difference at a glance in a VM` behavior. The problem consist at
> least two separate ones: the current hang and behavior with the
> unreverted patch from agraf - last one causes live migration with
> writeback cache to fail, cache=none works well in any variant which
> survives first condition. Marcin, would you mind to check the current
> state of the problem on your environments in a spare time? It is
> probably easier to reproduce on iscsi because of way smaller time
> needed to set it up, command line and libvirt config attached
> (v2.1.0-rc2 plus iscsi-1.11.0).

Ok, but what exacly do you want me to test?

Just to avoid any confusion, originally there were two problems with 
kvmclock:

1. Commit a096b3a6732f846ec57dc28b47ee9435aa0609bf fixes problem when 
clock drift (?) caused kvmclock in guest to report time in past which 
caused guest kernel to hang. This is hard to reproduce reliably 
(probably as it requires long time for drift to accumulate).

2. Commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 fixes regression 
caused by a096b3a6732f846ec57dc28b47ee9435aa0609bf which occured during 
non-migration operations (drive-mirror + pivot), which also caused guest 
kernel to hang. This is trival to reproduce.

I'm using both of them applied on top of 2.0 in production and have no 
problems with them. I'm using NFS exclusively with cache=none.

So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no 
extra patches applied or reverted, on VM that is running fio, am I correct?

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-16 21:28                         ` Marcin Gibuła
@ 2014-07-16 21:36                           ` Andrey Korolyov
  2014-07-17  9:49                             ` Marcin Gibuła
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-16 21:36 UTC (permalink / raw)
  To: Marcin Gibuła
  Cc: Amit Shah, Paolo Bonzini, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

On Thu, Jul 17, 2014 at 1:28 AM, Marcin Gibuła <m.gibula@beyond.pl> wrote:
>> Tested on iscsi pool, though there are no-cache requirement and rbd
>> with disabled cache may survive one migration but iscsi backend hangs
>> always. As it was before, just rolling back problematic commit fixes
>> the problem and adding cpu_synchronize_all_states to migration.c has
>> no difference at a glance in a VM` behavior. The problem consist at
>> least two separate ones: the current hang and behavior with the
>> unreverted patch from agraf - last one causes live migration with
>> writeback cache to fail, cache=none works well in any variant which
>> survives first condition. Marcin, would you mind to check the current
>> state of the problem on your environments in a spare time? It is
>> probably easier to reproduce on iscsi because of way smaller time
>> needed to set it up, command line and libvirt config attached
>> (v2.1.0-rc2 plus iscsi-1.11.0).
>
>
> Ok, but what exacly do you want me to test?
>
> Just to avoid any confusion, originally there were two problems with
> kvmclock:
>
> 1. Commit a096b3a6732f846ec57dc28b47ee9435aa0609bf fixes problem when clock
> drift (?) caused kvmclock in guest to report time in past which caused guest
> kernel to hang. This is hard to reproduce reliably (probably as it requires
> long time for drift to accumulate).
>
> 2. Commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 fixes regression caused
> by a096b3a6732f846ec57dc28b47ee9435aa0609bf which occured during
> non-migration operations (drive-mirror + pivot), which also caused guest
> kernel to hang. This is trival to reproduce.
>
> I'm using both of them applied on top of 2.0 in production and have no
> problems with them. I'm using NFS exclusively with cache=none.
>
> So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no
> extra patches applied or reverted, on VM that is running fio, am I correct?
>

Yes, exactly. ISCSI-based setup can take some minutes to deploy, given
prepared image, and I have one hundred percent hit rate for the
original issue with it.

> --
> mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-16 21:36                           ` Andrey Korolyov
@ 2014-07-17  9:49                             ` Marcin Gibuła
  2014-07-17 11:20                               ` Marcin Gibuła
  2014-07-17 11:54                               ` Marcin Gibuła
  0 siblings, 2 replies; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-17  9:49 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

>> I'm using both of them applied on top of 2.0 in production and have no
>> problems with them. I'm using NFS exclusively with cache=none.
>>
>> So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no
>> extra patches applied or reverted, on VM that is running fio, am I correct?
>>
>
> Yes, exactly. ISCSI-based setup can take some minutes to deploy, given
> prepared image, and I have one hundred percent hit rate for the
> original issue with it.

I've reproduced your IO hang with 2.0 and both 
9b1786829aefb83f37a8f3135e3ea91c56001b56 and 
a096b3a6732f846ec57dc28b47ee9435aa0609bf applied.

Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the 
problem (but reintroduces block-migration hang). It's seems like qemu 
bug rather than guest problem, as no-kvmclock parameters makes no 
difference. IO just stops, all qemu IO threads die off. Almost like it 
forgets to migrate them:-)

I'm attaching backtrace from guest kernel and qemu and qemu command line.

Going to compile 2.1-rc.

-- 
mg

[-- Attachment #2: guest-backtrace.txt --]
[-- Type: text/plain, Size: 3866 bytes --]

[  254.634525] SysRq : Show Blocked State
[  254.635041]   task                        PC stack   pid father
[  254.635304] kworker/0:2     D ffff88013fc145c0     0    83      2 0x00000000
[  254.635304] Workqueue: xfs-log/vdb xfs_log_worker [xfs]
[  254.635304]  ffff880136bdfa58 0000000000000046 ffff880136bdffd8 00000000000145c0
[  254.635304]  ffff880136bdffd8 00000000000145c0 ffff880136ad8000 ffff88013fc14e88
[  254.635304]  ffff880037bd4380 ffff880037bc5068 ffff880037bd43b0 ffff880037bd4380
[  254.635304] Call Trace:
[  254.635304]  [<ffffffff815e797d>] io_schedule+0x9d/0x140
[  254.635304]  [<ffffffff812921d5>] get_request+0x1b5/0x790
[  254.635304]  [<ffffffff81086ab0>] ? wake_up_bit+0x30/0x30
[  254.635304]  [<ffffffff81294236>] blk_queue_bio+0x96/0x390
[  254.635304]  [<ffffffff812904e2>] generic_make_request+0xe2/0x130
[  254.635304]  [<ffffffff812905a1>] submit_bio+0x71/0x150
[  254.635304]  [<ffffffff811e72c8>] ? bio_alloc_bioset+0x1e8/0x2e0
[  254.635304]  [<ffffffffa03310bb>] _xfs_buf_ioapply+0x2bb/0x3d0 [xfs]
[  254.635304]  [<ffffffffa038d3ef>] ? xlog_bdstrat+0x1f/0x50 [xfs]
[  254.635304]  [<ffffffffa03328e6>] xfs_buf_iorequest+0x46/0xa0 [xfs]
[  254.635304]  [<ffffffffa038d3ef>] xlog_bdstrat+0x1f/0x50 [xfs]
[  254.635304]  [<ffffffffa038f135>] xlog_sync+0x265/0x450 [xfs]
[  254.635304]  [<ffffffffa038f3b2>] xlog_state_release_iclog+0x92/0xb0 [xfs]
[  254.635304]  [<ffffffffa039016a>] _xfs_log_force+0x15a/0x290 [xfs]
[  254.635304]  [<ffffffff810115d6>] ? __switch_to+0x136/0x490
[  254.635304]  [<ffffffffa03902c6>] xfs_log_force+0x26/0x80 [xfs]
[  254.635304]  [<ffffffffa0390344>] xfs_log_worker+0x24/0x50 [xfs]
[  254.635304]  [<ffffffff8107e02b>] process_one_work+0x17b/0x460
[  254.635304]  [<ffffffff8107edfb>] worker_thread+0x11b/0x400
[  254.635304]  [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400
[  254.635304]  [<ffffffff81085aef>] kthread+0xcf/0xe0
[  254.635304]  [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[  254.635304]  [<ffffffff815f24ec>] ret_from_fork+0x7c/0xb0
[  254.635304]  [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[  254.635304] fio             D ffff88013fc145c0     0   772    770 0x00000000
[  254.635304]  ffff8800bba4b8c8 0000000000000082 ffff8800bba4bfd8 00000000000145c0
[  254.635304]  ffff8800bba4bfd8 00000000000145c0 ffff8801376ff1c0 ffff88013fc14e88
[  254.635304]  ffff880037bd4380 ffff880037baba90 ffff880037bd43b0 ffff880037bd4380
[  254.635304] Call Trace:
[  254.635304]  [<ffffffff815e797d>] io_schedule+0x9d/0x140
[  254.635304]  [<ffffffff812921d5>] get_request+0x1b5/0x790
[  254.635304]  [<ffffffff81086ab0>] ? wake_up_bit+0x30/0x30
[  254.635304]  [<ffffffff81294236>] blk_queue_bio+0x96/0x390
[  254.635304]  [<ffffffff812904e2>] generic_make_request+0xe2/0x130
[  254.635304]  [<ffffffff812905a1>] submit_bio+0x71/0x150
[  254.635304]  [<ffffffff811ed26c>] do_blockdev_direct_IO+0x14bc/0x2620
[  254.635304]  [<ffffffffa032bc30>] ? xfs_get_blocks+0x20/0x20 [xfs]
[  254.635304]  [<ffffffff811ee425>] __blockdev_direct_IO+0x55/0x60
[  254.635304]  [<ffffffffa032bc30>] ? xfs_get_blocks+0x20/0x20 [xfs]
[  254.635304]  [<ffffffffa032aaec>] xfs_vm_direct_IO+0x15c/0x180 [xfs]
[  254.635304]  [<ffffffffa032bc30>] ? xfs_get_blocks+0x20/0x20 [xfs]
[  254.635304]  [<ffffffff81143563>] generic_file_aio_read+0x6d3/0x750
[  254.635304]  [<ffffffff810b69c8>] ? ktime_get_ts+0x48/0xe0
[  254.635304]  [<ffffffff811030cf>] ? delayacct_end+0x8f/0xb0
[  254.635304]  [<ffffffff815e6a32>] ? down_read+0x12/0x30
[  254.635304]  [<ffffffffa0337224>] xfs_file_aio_read+0x154/0x2e0 [xfs]
[  254.635304]  [<ffffffffa03370d0>] ? xfs_file_splice_read+0x140/0x140 [xfs]
[  254.635304]  [<ffffffff811fd6a8>] do_io_submit+0x3b8/0x840
[  254.635304]  [<ffffffff811fdb40>] SyS_io_submit+0x10/0x20
[  254.635304]  [<ffffffff815f2599>] system_call_fastpath+0x16/0x1b


[-- Attachment #3: qemu-backtrace.txt --]
[-- Type: text/plain, Size: 2514 bytes --]

Thread 3 (Thread 0x7f4250f50700 (LWP 11955)):
#0  0x00007f4253d1a897 in ioctl () from /lib64/libc.so.6
#1  0x00007f4257f8adf9 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7f4258e2aa90, type=type@entry=44672)
    at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1796
#2  0x00007f4257f8af35 in kvm_cpu_exec (cpu=cpu@entry=0x7f4258e2aa90) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1681
#3  0x00007f4257f3071c in qemu_kvm_cpu_thread_fn (arg=0x7f4258e2aa90) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873
#4  0x00007f4253fe8f3a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f4253d22dad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f424b5ff700 (LWP 11957)):
#0  0x00007f4253fecd0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f425802c019 in qemu_cond_wait (cond=cond@entry=0x7f4258f0cfc0, mutex=mutex@entry=0x7f4258f0cff0)
    at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:135
#2  0x00007f4257f2070b in vnc_worker_thread_loop (queue=queue@entry=0x7f4258f0cfc0)
    at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:222
#3  0x00007f4257f20ae0 in vnc_worker_thread (arg=0x7f4258f0cfc0) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:323
#4  0x00007f4253fe8f3a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f4253d22dad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f4257cc6900 (LWP 11952)):
#0  0x00007f4253d19286 in ppoll () from /lib64/libc.so.6
#1  0x00007f4257eecd79 in ppoll (__ss=0x0, __timeout=0x7ffffc03af40, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=883000000)
    at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:316
#3  0x00007f4257eb02d4 in os_host_main_loop_wait (timeout=883000000) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:229
#4  main_loop_wait (nonblocking=<optimized out>) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:484
#5  0x00007f4257d7c05e in main_loop () at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:2051
#6  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:4507


[-- Attachment #4: qemu-cmdline.txt --]
[-- Type: text/plain, Size: 2522 bytes --]

/usr/bin/qemu-system-x86_64 -machine accel=kvm -name 21eae881-5e6f-4d13-9b7d-0b8279aed737 -S -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,+kvmclock -m 4096 -realtime mlock=on -smp 4,sockets=2,cores=10,threads=1 -uuid 21eae881-5e6f-4d13-9b7d-0b8279aed737 -smbios type=0,vendor=HAL 9000 -smbios type=1,manufacturer=testcloud -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/21eae881-5e6f-4d13-9b7d-0b8279aed737.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -no-hpet -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot order=dc,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/mnt/nfs/volumes/e919ceff-8344-4de5-82da-db49a20c4c87/active.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=threads,bps_rd=68157440,bps_wr=68157440,iops_rd=325,iops_wr=325 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/mnt/nfs/volumes/f2fb6c59-2960-4976-aaa1-6154f55f6a66/active.qcow2,if=none,id=drive-virtio-disk1,format=qcow2,cache=none,aio=threads,bps_rd=68157440,bps_wr=68157440,iops_rd=325,iops_wr=325 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:07:6f:fb,bus=pci.0,addr=0x3 -netdev tap,fd=25,id=hostnet1,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:39:21:d3,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/21eae881-5e6f-4d13-9b7d-0b8279aed737.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/21eae881-5e6f-4d13-9b7d-0b8279aed737.testcloud.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.testcloud.guest_agent.1 -device usb-tablet,id=input0 -vnc 0.0.0.0:1,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming tcp:0.0.0.0:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on -device pvpanic

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-17  9:49                             ` Marcin Gibuła
@ 2014-07-17 11:20                               ` Marcin Gibuła
  2014-07-17 11:54                               ` Marcin Gibuła
  1 sibling, 0 replies; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-17 11:20 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

> I've reproduced your IO hang with 2.0 and both
> 9b1786829aefb83f37a8f3135e3ea91c56001b56 and
> a096b3a6732f846ec57dc28b47ee9435aa0609bf applied.
>
> Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the
> problem (but reintroduces block-migration hang). It's seems like qemu
> bug rather than guest problem, as no-kvmclock parameters makes no
> difference. IO just stops, all qemu IO threads die off. Almost like it
> forgets to migrate them:-)

Some more info:

a) 2.0 + 9b1786829aefb83f37a8f3135e3ea91c56001b56 + 
a096b3a6732f846ec57dc28b47ee9435aa0609bf = hangs

b) 2.0 + 9b1786829aefb83f37a8f3135e3ea91c56001b56 = works

c) 2.0 + 9b1786829aefb83f37a8f3135e3ea91c56001b56 + move 
cpu_synchronize_state to migration.c = works

Tested with NFS (qcow2) + cache=none.

IO is dead only for disk that was being written to during migration.
I.e. if my test VM has two disks: vda and vdb, and I'm running fio on 
vdb and it hangs after migration, I can still issue writes to vda.

Recreation steps:
1. Create VM
2. Run fio (Andrey's config)
3. Live migrate VM couple of times.

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-17  9:49                             ` Marcin Gibuła
  2014-07-17 11:20                               ` Marcin Gibuła
@ 2014-07-17 11:54                               ` Marcin Gibuła
  2014-07-17 12:06                                 ` Andrey Korolyov
  1 sibling, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-17 11:54 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

>> Yes, exactly. ISCSI-based setup can take some minutes to deploy, given
>> prepared image, and I have one hundred percent hit rate for the
>> original issue with it.
>
> I've reproduced your IO hang with 2.0 and both
> 9b1786829aefb83f37a8f3135e3ea91c56001b56 and
> a096b3a6732f846ec57dc28b47ee9435aa0609bf applied.
>
> Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the
> problem (but reintroduces block-migration hang). It's seems like qemu
> bug rather than guest problem, as no-kvmclock parameters makes no
> difference. IO just stops, all qemu IO threads die off. Almost like it
> forgets to migrate them:-)
>
> I'm attaching backtrace from guest kernel and qemu and qemu command line.
>
> Going to compile 2.1-rc.

2.1-rc2 behaves exactly the same.

Interestingly enough, reseting guest system causes I/O to work again. So 
it's not qemu that hangs on IO, rather it fails to notify guest about 
completed operations that were issued during migration.

And its somehow caused by calling cpu_synchronize_all_states() inside 
kvmclock_vm_state_change().



As for testing with cache=writeback, I'll try to setup some iscsi to 
test it.

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-17 11:54                               ` Marcin Gibuła
@ 2014-07-17 12:06                                 ` Andrey Korolyov
  2014-07-17 13:25                                   ` Marcin Gibuła
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-17 12:06 UTC (permalink / raw)
  To: Marcin Gibuła
  Cc: Amit Shah, Paolo Bonzini, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

On Thu, Jul 17, 2014 at 3:54 PM, Marcin Gibuła <m.gibula@beyond.pl> wrote:
>>> Yes, exactly. ISCSI-based setup can take some minutes to deploy, given
>>> prepared image, and I have one hundred percent hit rate for the
>>> original issue with it.
>>
>>
>> I've reproduced your IO hang with 2.0 and both
>> 9b1786829aefb83f37a8f3135e3ea91c56001b56 and
>> a096b3a6732f846ec57dc28b47ee9435aa0609bf applied.
>>
>> Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the
>> problem (but reintroduces block-migration hang). It's seems like qemu
>> bug rather than guest problem, as no-kvmclock parameters makes no
>> difference. IO just stops, all qemu IO threads die off. Almost like it
>> forgets to migrate them:-)
>>
>> I'm attaching backtrace from guest kernel and qemu and qemu command line.
>>
>> Going to compile 2.1-rc.
>
>
> 2.1-rc2 behaves exactly the same.
>
> Interestingly enough, reseting guest system causes I/O to work again. So
> it's not qemu that hangs on IO, rather it fails to notify guest about
> completed operations that were issued during migration.
>
> And its somehow caused by calling cpu_synchronize_all_states() inside
> kvmclock_vm_state_change().
>
>
>
> As for testing with cache=writeback, I'll try to setup some iscsi to test
> it.

Awesome, thanks! AFAIK you`ll not be able to use write cache with
iscsi for migration. VM which had a reset before hangs always when
freshly launched have a chance to be migrated successfully. And yes,
it looks like lower layer forgetting to notify driver about some
operations at a glance.

>
> --
> mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-17 12:06                                 ` Andrey Korolyov
@ 2014-07-17 13:25                                   ` Marcin Gibuła
  2014-07-17 19:18                                     ` Dr. David Alan Gilbert
                                                       ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-17 13:25 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

>> 2.1-rc2 behaves exactly the same.
>>
>> Interestingly enough, reseting guest system causes I/O to work again. So
>> it's not qemu that hangs on IO, rather it fails to notify guest about
>> completed operations that were issued during migration.
>>
>> And its somehow caused by calling cpu_synchronize_all_states() inside
>> kvmclock_vm_state_change().
>>
>>
>>
>> As for testing with cache=writeback, I'll try to setup some iscsi to test
>> it.
>
> Awesome, thanks! AFAIK you`ll not be able to use write cache with
> iscsi for migration. VM which had a reset before hangs always when
> freshly launched have a chance to be migrated successfully. And yes,
> it looks like lower layer forgetting to notify driver about some
> operations at a glance.

Andrey,

could you try attached patch? It's an incredibly ugly workaround that 
calls cpu_synchronize_all_states() in a way that bypasses lazy execution 
logic.

But it works for me. If that works for you as well, its somehow related 
to lazy execution of cpu_synchronize_all_states.

-- 
mg

[-- Attachment #2: io-hang.patch --]
[-- Type: text/x-patch, Size: 3151 bytes --]

diff -ru qemu-2.1.0-rc2/cpus.c qemu-2.1.0-rc2-fixed/cpus.c
--- qemu-2.1.0-rc2/cpus.c	2014-07-15 23:49:14.000000000 +0200
+++ qemu-2.1.0-rc2-fixed/cpus.c	2014-07-17 15:09:09.306696284 +0200
@@ -505,6 +505,15 @@
     }
 }
 
+void cpu_synchronize_all_states_always(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        cpu_synchronize_state_always(cpu);
+    }
+}
+
 void cpu_synchronize_all_post_reset(void)
 {
     CPUState *cpu;
diff -ru qemu-2.1.0-rc2/hw/i386/kvm/clock.c qemu-2.1.0-rc2-fixed/hw/i386/kvm/clock.c
--- qemu-2.1.0-rc2/hw/i386/kvm/clock.c	2014-07-15 23:49:14.000000000 +0200
+++ qemu-2.1.0-rc2-fixed/hw/i386/kvm/clock.c	2014-07-17 15:08:25.627063756 +0200
@@ -126,7 +126,7 @@
             return;
         }
 
-        cpu_synchronize_all_states();
+        cpu_synchronize_all_states_always();
         ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
         if (ret < 0) {
             fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
diff -ru qemu-2.1.0-rc2/include/sysemu/cpus.h qemu-2.1.0-rc2-fixed/include/sysemu/cpus.h
--- qemu-2.1.0-rc2/include/sysemu/cpus.h	2014-07-15 23:49:14.000000000 +0200
+++ qemu-2.1.0-rc2-fixed/include/sysemu/cpus.h	2014-07-17 15:09:23.256578916 +0200
@@ -7,6 +7,7 @@
 void pause_all_vcpus(void);
 void cpu_stop_current(void);
 
+void cpu_synchronize_all_states_always(void);
 void cpu_synchronize_all_states(void);
 void cpu_synchronize_all_post_reset(void);
 void cpu_synchronize_all_post_init(void);
diff -ru qemu-2.1.0-rc2/include/sysemu/kvm.h qemu-2.1.0-rc2-fixed/include/sysemu/kvm.h
--- qemu-2.1.0-rc2/include/sysemu/kvm.h	2014-07-15 23:49:14.000000000 +0200
+++ qemu-2.1.0-rc2-fixed/include/sysemu/kvm.h	2014-07-17 15:11:54.855303171 +0200
@@ -346,9 +346,11 @@
 #endif /* NEED_CPU_H */
 
 void kvm_cpu_synchronize_state(CPUState *cpu);
+void kvm_cpu_synchronize_state_always(CPUState *cpu);
 void kvm_cpu_synchronize_post_reset(CPUState *cpu);
 void kvm_cpu_synchronize_post_init(CPUState *cpu);
 
+
 /* generic hooks - to be moved/refactored once there are more users */
 
 static inline void cpu_synchronize_state(CPUState *cpu)
@@ -358,6 +360,13 @@
     }
 }
 
+static inline void cpu_synchronize_state_always(CPUState *cpu)
+{
+    if (kvm_enabled()) {
+        kvm_cpu_synchronize_state_always(cpu);
+    }
+}
+
 static inline void cpu_synchronize_post_reset(CPUState *cpu)
 {
     if (kvm_enabled()) {
diff -ru qemu-2.1.0-rc2/kvm-all.c qemu-2.1.0-rc2-fixed/kvm-all.c
--- qemu-2.1.0-rc2/kvm-all.c	2014-07-15 23:49:14.000000000 +0200
+++ qemu-2.1.0-rc2-fixed/kvm-all.c	2014-07-17 15:14:04.884208826 +0200
@@ -1652,6 +1652,13 @@
     s->coalesced_flush_in_progress = false;
 }
 
+static void do_kvm_cpu_synchronize_state_always(void *arg)
+{
+    CPUState *cpu = arg;
+
+    kvm_arch_get_registers(cpu);
+}
+
 static void do_kvm_cpu_synchronize_state(void *arg)
 {
     CPUState *cpu = arg;
@@ -1669,6 +1676,11 @@
     }
 }
 
+void kvm_cpu_synchronize_state_always(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_kvm_cpu_synchronize_state_always, cpu);
+}
+
 void kvm_cpu_synchronize_post_reset(CPUState *cpu)
 {
     kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-17 13:25                                   ` Marcin Gibuła
@ 2014-07-17 19:18                                     ` Dr. David Alan Gilbert
  2014-07-17 20:33                                       ` Marcin Gibuła
  2014-07-17 20:50                                     ` Andrey Korolyov
  2014-07-18  8:48                                     ` Paolo Bonzini
  2 siblings, 1 reply; 76+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-17 19:18 UTC (permalink / raw)
  To: Marcin Gibu??a
  Cc: Andrey Korolyov, Fam Zheng, Marcelo Tosatti,
	qemu-devel@nongnu.org, kraxel, Amit Shah, Paolo Bonzini

I don't know if this is the same case, but Gerd showed me a migration failure
that might be related.  2.0 seems OK, 2.1-rc0 is broken (and I've not found
another working point in between yet).

The test case involves booting a fedora livecd (using an IDE CDROM device)
and after the migration we're seeing squashfs errors and stuff gently
falling apart.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-17 19:18                                     ` Dr. David Alan Gilbert
@ 2014-07-17 20:33                                       ` Marcin Gibuła
  0 siblings, 0 replies; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-17 20:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Andrey Korolyov, Fam Zheng, Marcelo Tosatti,
	qemu-devel@nongnu.org, kraxel, Amit Shah, Paolo Bonzini

W dniu 2014-07-17 21:18, Dr. David Alan Gilbert pisze:
> I don't know if this is the same case, but Gerd showed me a migration failure
> that might be related.  2.0 seems OK, 2.1-rc0 is broken (and I've not found
> another working point in between yet).
>
> The test case involves booting a fedora livecd (using an IDE CDROM device)
> and after the migration we're seeing squashfs errors and stuff gently
> falling apart.

Perhaps you could try testing workaround patch I sent earlier? It's not 
proposal for inclusion, just a test patch that seems to fix IO hang for me.

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-17 13:25                                   ` Marcin Gibuła
  2014-07-17 19:18                                     ` Dr. David Alan Gilbert
@ 2014-07-17 20:50                                     ` Andrey Korolyov
  2014-07-18  8:21                                       ` Marcin Gibuła
  2014-07-18  8:48                                     ` Paolo Bonzini
  2 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-17 20:50 UTC (permalink / raw)
  To: Marcin Gibuła
  Cc: Amit Shah, Paolo Bonzini, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

On Thu, Jul 17, 2014 at 5:25 PM, Marcin Gibuła <m.gibula@beyond.pl> wrote:
>>> 2.1-rc2 behaves exactly the same.
>>>
>>> Interestingly enough, reseting guest system causes I/O to work again. So
>>> it's not qemu that hangs on IO, rather it fails to notify guest about
>>> completed operations that were issued during migration.
>>>
>>> And its somehow caused by calling cpu_synchronize_all_states() inside
>>> kvmclock_vm_state_change().
>>>
>>>
>>>
>>> As for testing with cache=writeback, I'll try to setup some iscsi to test
>>> it.
>>
>>
>> Awesome, thanks! AFAIK you`ll not be able to use write cache with
>> iscsi for migration. VM which had a reset before hangs always when
>> freshly launched have a chance to be migrated successfully. And yes,
>> it looks like lower layer forgetting to notify driver about some
>> operations at a glance.
>
>
> Andrey,
>
> could you try attached patch? It's an incredibly ugly workaround that calls
> cpu_synchronize_all_states() in a way that bypasses lazy execution logic.
>
> But it works for me. If that works for you as well, its somehow related to
> lazy execution of cpu_synchronize_all_states.
>
> --
> mg

Yes, it working well with writeback cache too.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-17 20:50                                     ` Andrey Korolyov
@ 2014-07-18  8:21                                       ` Marcin Gibuła
  2014-07-18  8:36                                         ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-18  8:21 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

>> could you try attached patch? It's an incredibly ugly workaround that calls
>> cpu_synchronize_all_states() in a way that bypasses lazy execution logic.
>>
>> But it works for me. If that works for you as well, its somehow related to
>> lazy execution of cpu_synchronize_all_states.
>>
>> --
>> mg
>
> Yes, it working well with writeback cache too.

Does it fix problem with libvirt migration timing out for you as well?

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-18  8:21                                       ` Marcin Gibuła
@ 2014-07-18  8:36                                         ` Andrey Korolyov
  2014-07-18  8:44                                           ` Marcin Gibuła
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-07-18  8:36 UTC (permalink / raw)
  To: Marcin Gibuła
  Cc: Amit Shah, Paolo Bonzini, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

On Fri, Jul 18, 2014 at 12:21 PM, Marcin Gibuła <m.gibula@beyond.pl> wrote:
>>> could you try attached patch? It's an incredibly ugly workaround that
>>> calls
>>> cpu_synchronize_all_states() in a way that bypasses lazy execution logic.
>>>
>>> But it works for me. If that works for you as well, its somehow related
>>> to
>>> lazy execution of cpu_synchronize_all_states.
>>>
>>> --
>>> mg
>>
>>
>> Yes, it working well with writeback cache too.
>
>
> Does it fix problem with libvirt migration timing out for you as well?
>

Oh, forgot to mention - yes, all migration-related problems are fixed.
Though release right now in a freeze phase, I`d like to ask
maintainers to consider possibility of fixing the problem on top of
the current tree instead of just rolling back problematic snippet.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-18  8:36                                         ` Andrey Korolyov
@ 2014-07-18  8:44                                           ` Marcin Gibuła
  2014-07-18  8:51                                             ` Paolo Bonzini
  0 siblings, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-18  8:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Andrey Korolyov, Fam Zheng, qemu-devel@nongnu.org,
	Marcelo Tosatti

>> Does it fix problem with libvirt migration timing out for you as well?
>>
>
> Oh, forgot to mention - yes, all migration-related problems are fixed.
> Though release right now in a freeze phase, I`d like to ask
> maintainers to consider possibility of fixing the problem on top of
> the current tree instead of just rolling back problematic snippet.

Paolo,

if patch in its current form is not acceptable for you for inclusion, 
I'll try rewrite it according to your comments.

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-17 13:25                                   ` Marcin Gibuła
  2014-07-17 19:18                                     ` Dr. David Alan Gilbert
  2014-07-17 20:50                                     ` Andrey Korolyov
@ 2014-07-18  8:48                                     ` Paolo Bonzini
  2014-07-18  8:57                                       ` Amit Shah
                                                         ` (2 more replies)
  2 siblings, 3 replies; 76+ messages in thread
From: Paolo Bonzini @ 2014-07-18  8:48 UTC (permalink / raw)
  To: Marcin Gibuła, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

Il 17/07/2014 15:25, Marcin Gibuła ha scritto:
> +static void do_kvm_cpu_synchronize_state_always(void *arg)
> +{
> +    CPUState *cpu = arg;
> +
> +    kvm_arch_get_registers(cpu);
> +}
> +

The name of the hack^Wfunction is tricky, because compared to 
do_kvm_cpu_synchronize_state there are three things you change:

1) you always synchronize the state

2) the next call to do_kvm_cpu_synchronize_state will do 
kvm_arch_get_registers

3) the next CPU entry will call kvm_arch_put_registers:

         if (cpu->kvm_vcpu_dirty) {
             kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE);
             cpu->kvm_vcpu_dirty = false;
         }

It is easy to find out if the "fix" is related to 1 or 2/3: just write

      if (cpu->kvm_vcpu_dirty) {
          printf ("do_kvm_cpu_synchronize_state_always: look at 2/3\n");
          kvm_arch_get_registers(cpu);
      } else {
          printf ("do_kvm_cpu_synchronize_state_always: look at 1\n");
      }

To further refine between 2 and 3, I suppose you can set a breakpoint on 
cpu_synchronize_all_states and kvm_cpu_exec, and see which is called 
first after cpu_synchronize_all_states_always.

I still lean very much towards reverting the patches now.  We can 
reapply them, fixed, in 2.1.1.

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-18  8:44                                           ` Marcin Gibuła
@ 2014-07-18  8:51                                             ` Paolo Bonzini
  0 siblings, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2014-07-18  8:51 UTC (permalink / raw)
  To: Marcin Gibuła
  Cc: Amit Shah, Andrey Korolyov, Fam Zheng, qemu-devel@nongnu.org,
	Marcelo Tosatti

Il 18/07/2014 10:44, Marcin Gibuła ha scritto:
>>
>
> Paolo,
>
> if patch in its current form is not acceptable for you for inclusion,
> I'll try rewrite it according to your comments.

The problem is that we don't know _why_ the patch is fixing things.

Considering that your kvmclock bug has been there literally for years, I 
think it's better to delay that one to 2.2.

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-18  8:48                                     ` Paolo Bonzini
@ 2014-07-18  8:57                                       ` Amit Shah
  2014-07-18  9:32                                       ` Marcin Gibuła
  2014-07-29 16:58                                       ` Paolo Bonzini
  2 siblings, 0 replies; 76+ messages in thread
From: Amit Shah @ 2014-07-18  8:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, Andrey Korolyov, Marcin Gibuła,
	qemu-devel@nongnu.org, Marcelo Tosatti

On (Fri) 18 Jul 2014 [10:48:40], Paolo Bonzini wrote:
> Il 17/07/2014 15:25, Marcin Gibuła ha scritto:
> >+static void do_kvm_cpu_synchronize_state_always(void *arg)
> >+{
> >+    CPUState *cpu = arg;
> >+
> >+    kvm_arch_get_registers(cpu);
> >+}
> >+
> 
> The name of the hack^Wfunction is tricky, because compared to
> do_kvm_cpu_synchronize_state there are three things you change:
> 
> 1) you always synchronize the state
> 
> 2) the next call to do_kvm_cpu_synchronize_state will do
> kvm_arch_get_registers
> 
> 3) the next CPU entry will call kvm_arch_put_registers:
> 
>         if (cpu->kvm_vcpu_dirty) {
>             kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE);
>             cpu->kvm_vcpu_dirty = false;
>         }
> 
> It is easy to find out if the "fix" is related to 1 or 2/3: just write
> 
>      if (cpu->kvm_vcpu_dirty) {
>          printf ("do_kvm_cpu_synchronize_state_always: look at 2/3\n");
>          kvm_arch_get_registers(cpu);
>      } else {
>          printf ("do_kvm_cpu_synchronize_state_always: look at 1\n");
>      }
> 
> To further refine between 2 and 3, I suppose you can set a breakpoint on
> cpu_synchronize_all_states and kvm_cpu_exec, and see which is called first
> after cpu_synchronize_all_states_always.
> 
> I still lean very much towards reverting the patches now.  We can reapply
> them, fixed, in 2.1.1.

FWIW I agree with this plan.

		Amit

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-18  8:48                                     ` Paolo Bonzini
  2014-07-18  8:57                                       ` Amit Shah
@ 2014-07-18  9:32                                       ` Marcin Gibuła
  2014-07-18  9:37                                         ` Paolo Bonzini
  2014-07-29 16:58                                       ` Paolo Bonzini
  2 siblings, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-18  9:32 UTC (permalink / raw)
  To: Paolo Bonzini, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

> The name of the hack^Wfunction is tricky, because compared to
> do_kvm_cpu_synchronize_state there are three things you change:
>
> 1) you always synchronize the state
>
> 2) the next call to do_kvm_cpu_synchronize_state will do
> kvm_arch_get_registers

Yes.

> 3) the next CPU entry will call kvm_arch_put_registers:
>
>          if (cpu->kvm_vcpu_dirty) {
>              kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE);
>              cpu->kvm_vcpu_dirty = false;
>          }

But, I don't set cpu->kvm_vcpu_dirty anywhere (?).

> I still lean very much towards reverting the patches now.  We can
> reapply them, fixed, in 2.1.1.

That's probably good idea.

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-18  9:32                                       ` Marcin Gibuła
@ 2014-07-18  9:37                                         ` Paolo Bonzini
  2014-07-18  9:48                                           ` Marcin Gibuła
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-07-18  9:37 UTC (permalink / raw)
  To: Marcin Gibuła, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

Il 18/07/2014 11:32, Marcin Gibuła ha scritto:
>
>> 3) the next CPU entry will call kvm_arch_put_registers:
>>
>>          if (cpu->kvm_vcpu_dirty) {
>>              kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE);
>>              cpu->kvm_vcpu_dirty = false;
>>          }
>
> But, I don't set cpu->kvm_vcpu_dirty anywhere (?).

Yeah, the next CPU entry will *not* call kvm_arch_put_registers with 
your change.  It will call it with vanilla cpu_synchronize_all_states().

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-18  9:37                                         ` Paolo Bonzini
@ 2014-07-18  9:48                                           ` Marcin Gibuła
  0 siblings, 0 replies; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-18  9:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Andrey Korolyov, Fam Zheng, qemu-devel@nongnu.org,
	Marcelo Tosatti

W dniu 2014-07-18 11:37, Paolo Bonzini pisze:
> Il 18/07/2014 11:32, Marcin Gibuła ha scritto:
>>
>>> 3) the next CPU entry will call kvm_arch_put_registers:
>>>
>>>          if (cpu->kvm_vcpu_dirty) {
>>>              kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE);
>>>              cpu->kvm_vcpu_dirty = false;
>>>          }
>>
>> But, I don't set cpu->kvm_vcpu_dirty anywhere (?).
>
> Yeah, the next CPU entry will *not* call kvm_arch_put_registers with
> your change.  It will call it with vanilla cpu_synchronize_all_states().

That's because in kvmclock, it's used only to read cpu registers, not 
edit them.

Now, because making this call "invisible" makes it work, I'm speculating 
that following happens:

[migration starts]
kvmclock: calls cpu_synchronize_all_states()
somewhere in qemu: completes IO
somewhere in qemu: calls cpu_synchronize_all_states() <- old state


Is it (or something similar) possible? I didn't dig deep enough into 
internals yet, but perhaps you could point if thats the right direction?

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-18  8:48                                     ` Paolo Bonzini
  2014-07-18  8:57                                       ` Amit Shah
  2014-07-18  9:32                                       ` Marcin Gibuła
@ 2014-07-29 16:58                                       ` Paolo Bonzini
  2014-07-30 12:02                                         ` Marcin Gibuła
  2 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-07-29 16:58 UTC (permalink / raw)
  To: Marcin Gibuła, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

Il 18/07/2014 10:48, Paolo Bonzini ha scritto:
> 
> It is easy to find out if the "fix" is related to 1 or 2/3: just write
> 
>      if (cpu->kvm_vcpu_dirty) {
>          printf ("do_kvm_cpu_synchronize_state_always: look at 2/3\n");
>          kvm_arch_get_registers(cpu);
>      } else {
>          printf ("do_kvm_cpu_synchronize_state_always: look at 1\n");
>      }
> 
> To further refine between 2 and 3, I suppose you can set a breakpoint on
> cpu_synchronize_all_states and kvm_cpu_exec, and see which is called
> first after cpu_synchronize_all_states_always.

Marcin, have you ever gotten round to doing this?

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-29 16:58                                       ` Paolo Bonzini
@ 2014-07-30 12:02                                         ` Marcin Gibuła
  2014-07-30 13:38                                           ` Paolo Bonzini
  0 siblings, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-30 12:02 UTC (permalink / raw)
  To: Paolo Bonzini, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

On 29.07.2014 18:58, Paolo Bonzini wrote:
> Il 18/07/2014 10:48, Paolo Bonzini ha scritto:
>>
>> It is easy to find out if the "fix" is related to 1 or 2/3: just write
>>
>>       if (cpu->kvm_vcpu_dirty) {
>>           printf ("do_kvm_cpu_synchronize_state_always: look at 2/3\n");
>>           kvm_arch_get_registers(cpu);
>>       } else {
>>           printf ("do_kvm_cpu_synchronize_state_always: look at 1\n");
>>       }
>>
>> To further refine between 2 and 3, I suppose you can set a breakpoint on
>> cpu_synchronize_all_states and kvm_cpu_exec, and see which is called
>> first after cpu_synchronize_all_states_always.
>
> Marcin, have you ever gotten round to doing this?

Source side of migration, without my ugly hack:

called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
called kvm_cpu_synchronize_state: vcpu dirty
called kvm_cpu_synchronize_state: vcpu dirty
shutting down

without it:

called do_kvm_cpu_synchronize_state_always
called do_kvm_cpu_synchronize_state_always
called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
shutting down

So it's probably about 2 from your list ("the next call to 
do_kvm_cpu_synchronize_state will do kvm_arch_get_registers").

I've tapped into kvm_cpu_exec() to find out if it's 
kvm_arch_put_registers(), but nothing was logged during migration so 
it's probably not 3.

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-30 12:02                                         ` Marcin Gibuła
@ 2014-07-30 13:38                                           ` Paolo Bonzini
  2014-07-30 22:12                                             ` Marcin Gibuła
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-07-30 13:38 UTC (permalink / raw)
  To: Marcin Gibuła, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

Il 30/07/2014 14:02, Marcin Gibuła ha scritto:
> without it:
> 
> called do_kvm_cpu_synchronize_state_always
> called do_kvm_cpu_synchronize_state_always
> called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
> called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
> shutting down
> 
> So it's probably about 2 from your list ("the next call to
> do_kvm_cpu_synchronize_state will do kvm_arch_get_registers").

Can you dump *env before and after the call to kvm_arch_get_registers?

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-30 13:38                                           ` Paolo Bonzini
@ 2014-07-30 22:12                                             ` Marcin Gibuła
  2014-07-31 11:27                                               ` Marcin Gibuła
  0 siblings, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-30 22:12 UTC (permalink / raw)
  To: Paolo Bonzini, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

W dniu 2014-07-30 15:38, Paolo Bonzini pisze:
> Il 30/07/2014 14:02, Marcin Gibuła ha scritto:
>> without it:

s/without/with/ of course...

>> called do_kvm_cpu_synchronize_state_always
>> called do_kvm_cpu_synchronize_state_always
>> called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
>> called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
>> shutting down
>>
>> So it's probably about 2 from your list ("the next call to
>> do_kvm_cpu_synchronize_state will do kvm_arch_get_registers").
>
> Can you dump *env before and after the call to kvm_arch_get_registers?

Yes, but it seems they are equal - I used memcmp() to compare them. Is 
there any other side effect that cpu_synchronize_all_states() may have?

The second caller of this function is qemu_savevm_state_complete().

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-30 22:12                                             ` Marcin Gibuła
@ 2014-07-31 11:27                                               ` Marcin Gibuła
  2014-08-04 16:30                                                 ` Marcin Gibuła
  0 siblings, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-07-31 11:27 UTC (permalink / raw)
  To: Paolo Bonzini, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

>> Can you dump *env before and after the call to kvm_arch_get_registers?
>
> Yes, but it seems they are equal - I used memcmp() to compare them. Is
> there any other side effect that cpu_synchronize_all_states() may have?

I think I found it.

The reason for hang is, because when second call to 
kvm_arch_get_registers() is skipped, it also skips kvm_get_apic() which 
updates cpu->apic_state.

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-07-31 11:27                                               ` Marcin Gibuła
@ 2014-08-04 16:30                                                 ` Marcin Gibuła
  2014-08-04 18:30                                                   ` Paolo Bonzini
  2014-10-09 19:07                                                   ` Eduardo Habkost
  0 siblings, 2 replies; 76+ messages in thread
From: Marcin Gibuła @ 2014-08-04 16:30 UTC (permalink / raw)
  To: Paolo Bonzini, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

W dniu 2014-07-31 13:27, Marcin Gibuła pisze:
>>> Can you dump *env before and after the call to kvm_arch_get_registers?
>>
>> Yes, but it seems they are equal - I used memcmp() to compare them. Is
>> there any other side effect that cpu_synchronize_all_states() may have?
>
> I think I found it.
>
> The reason for hang is, because when second call to
> kvm_arch_get_registers() is skipped, it also skips kvm_get_apic() which
> updates cpu->apic_state.

Paolo,

is this analysis deep enough for you? I don't know if that can be fixed 
with existing api as cpu_synchronize_all_states() is all or nothing kind 
of stuff.

Kvmclock needs it only to read current cpu registers, so syncing 
everything is not really necessary. Perhaps exporting one of 
kvm_arch_get_* would be enough. And it wouldn't mess with lazy get/put.

On the other hand, if in future any other driver adds 
cpu_synchronize_all_states() in its change state callback it could 
result in same error so perhaps more generic approach is needed.

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-04 16:30                                                 ` Marcin Gibuła
@ 2014-08-04 18:30                                                   ` Paolo Bonzini
  2014-08-08 21:37                                                     ` Marcelo Tosatti
  2014-10-09 19:07                                                   ` Eduardo Habkost
  1 sibling, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-08-04 18:30 UTC (permalink / raw)
  To: Marcin Gibuła, Andrey Korolyov
  Cc: Amit Shah, Marcelo Tosatti, Fam Zheng, qemu-devel@nongnu.org

Il 04/08/2014 18:30, Marcin Gibuła ha scritto:
> 
> 
> is this analysis deep enough for you? I don't know if that can be fixed
> with existing api as cpu_synchronize_all_states() is all or nothing kind
> of stuff.
> 
> Kvmclock needs it only to read current cpu registers, so syncing
> everything is not really necessary. Perhaps exporting one of
> kvm_arch_get_* would be enough. And it wouldn't mess with lazy get/put.
> 
> On the other hand, if in future any other driver adds
> cpu_synchronize_all_states() in its change state callback it could
> result in same error so perhaps more generic approach is needed.

Yeah, I need to sit down and look at the code more closely...  Perhaps a
cpu_mark_all_dirty() is enough.

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-04 18:30                                                   ` Paolo Bonzini
@ 2014-08-08 21:37                                                     ` Marcelo Tosatti
  2014-08-09  6:35                                                       ` Paolo Bonzini
  0 siblings, 1 reply; 76+ messages in thread
From: Marcelo Tosatti @ 2014-08-08 21:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Fam Zheng, Andrey Korolyov, Marcin Gibuła,
	qemu-devel@nongnu.org

On Mon, Aug 04, 2014 at 08:30:48PM +0200, Paolo Bonzini wrote:
> Il 04/08/2014 18:30, Marcin Gibuła ha scritto:
> > 
> > 
> > is this analysis deep enough for you? I don't know if that can be fixed
> > with existing api as cpu_synchronize_all_states() is all or nothing kind
> > of stuff.
> > 
> > Kvmclock needs it only to read current cpu registers, so syncing
> > everything is not really necessary. Perhaps exporting one of
> > kvm_arch_get_* would be enough. And it wouldn't mess with lazy get/put.
> > 
> > On the other hand, if in future any other driver adds
> > cpu_synchronize_all_states() in its change state callback it could
> > result in same error so perhaps more generic approach is needed.
> 
> Yeah, I need to sit down and look at the code more closely...  Perhaps a
> cpu_mark_all_dirty() is enough.

Hi Paolo,

cpu_clean_all_dirty, you mean? Has the same effect.

Marcin's patch to add cpu_synchronize_state_always() has the same
effect.

What do you prefer ?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-08 21:37                                                     ` Marcelo Tosatti
@ 2014-08-09  6:35                                                       ` Paolo Bonzini
  2014-08-21 15:48                                                         ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-08-09  6:35 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Fam Zheng, Andrey Korolyov, Marcin Gibuła,
	qemu-devel


> > Yeah, I need to sit down and look at the code more closely...  Perhaps a
> > cpu_mark_all_dirty() is enough.
> 
> Hi Paolo,
> 
> cpu_clean_all_dirty, you mean? Has the same effect.
> 
> Marcin's patch to add cpu_synchronize_state_always() has the same
> effect.
> 
> What do you prefer ?

I'd prefer cpu_clean_all_dirty because you can call it from the APIC
load functions.  The bug with your patch is due to the APIC and to
migration, it's not in the way your patch touches the kvmclock
vmstate_change_handler.

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-09  6:35                                                       ` Paolo Bonzini
@ 2014-08-21 15:48                                                         ` Andrey Korolyov
  2014-08-21 16:41                                                           ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-21 15:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Fam Zheng, Marcelo Tosatti, Marcin Gibuła,
	qemu-devel@nongnu.org

On Sat, Aug 9, 2014 at 10:35 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> > Yeah, I need to sit down and look at the code more closely...  Perhaps a
>> > cpu_mark_all_dirty() is enough.
>>
>> Hi Paolo,
>>
>> cpu_clean_all_dirty, you mean? Has the same effect.
>>
>> Marcin's patch to add cpu_synchronize_state_always() has the same
>> effect.
>>
>> What do you prefer ?
>
> I'd prefer cpu_clean_all_dirty because you can call it from the APIC
> load functions.  The bug with your patch is due to the APIC and to
> migration, it's not in the way your patch touches the kvmclock
> vmstate_change_handler.
>
> Paolo

Hello,


JFYI - Windows showing the same behavior after migration using bare
2.1 (frozen disk with latest virtio block drivers). Reverting agraf`s
patches back plus Marcin`s fix saves the situation, so AFAICS problem
is critical to M$ products on KVM.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-21 15:48                                                         ` Andrey Korolyov
@ 2014-08-21 16:41                                                           ` Andrey Korolyov
  2014-08-21 16:44                                                             ` Paolo Bonzini
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-21 16:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Fam Zheng, Marcelo Tosatti, Marcin Gibuła,
	qemu-devel@nongnu.org

On Thu, Aug 21, 2014 at 7:48 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Sat, Aug 9, 2014 at 10:35 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>> > Yeah, I need to sit down and look at the code more closely...  Perhaps a
>>> > cpu_mark_all_dirty() is enough.
>>>
>>> Hi Paolo,
>>>
>>> cpu_clean_all_dirty, you mean? Has the same effect.
>>>
>>> Marcin's patch to add cpu_synchronize_state_always() has the same
>>> effect.
>>>
>>> What do you prefer ?
>>
>> I'd prefer cpu_clean_all_dirty because you can call it from the APIC
>> load functions.  The bug with your patch is due to the APIC and to
>> migration, it's not in the way your patch touches the kvmclock
>> vmstate_change_handler.
>>
>> Paolo
>
> Hello,
>
>
> JFYI - Windows showing the same behavior after migration using bare
> 2.1 (frozen disk with latest virtio block drivers). Reverting agraf`s
> patches back plus Marcin`s fix saves the situation, so AFAICS problem
> is critical to M$ products on KVM.

Sorry, the test series revealed that the problem is still here, but
with lower hit ratio with modified 2.1-HEAD using selected argument
set. The actual root of the issue is in '-cpu
qemu64,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000'
Windows-specific addition which can be traded for some idle CPU
consumption. Reproduction of bug is quite simple - fire up Windows VM,
migrate it two-three times and then try to log in using rdesktop/VNC -
if disk is frozen, login progress will freeze too. It is not so easy
to detect blocked I/O on Windows in the interactive session due to
lack of soft lockup warning equivalent, so one can try such a sequence
to check.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-21 16:41                                                           ` Andrey Korolyov
@ 2014-08-21 16:44                                                             ` Paolo Bonzini
  2014-08-21 17:51                                                               ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-08-21 16:44 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Fam Zheng, Marcelo Tosatti, Marcin Gibuła,
	qemu-devel@nongnu.org

Il 21/08/2014 18:41, Andrey Korolyov ha scritto:
> Sorry, the test series revealed that the problem is still here, but
> with lower hit ratio with modified 2.1-HEAD using selected argument
> set. The actual root of the issue is in '-cpu
> qemu64,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000'
> Windows-specific addition which can be traded for some idle CPU
> consumption. Reproduction of bug is quite simple - fire up Windows VM,
> migrate it two-three times and then try to log in using rdesktop/VNC -
> if disk is frozen, login progress will freeze too. It is not so easy
> to detect blocked I/O on Windows in the interactive session due to
> lack of soft lockup warning equivalent, so one can try such a sequence
> to check.

What kernel version?

The ioapic patches at http://thread.gmane.org/gmane.linux.kernel/1671045
fixed exactly a Windows-specific hang in migration.  But it didn't require
Hyper-V enlightenments.

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-21 16:44                                                             ` Paolo Bonzini
@ 2014-08-21 17:51                                                               ` Andrey Korolyov
  2014-08-22 16:44                                                                 ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-21 17:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Fam Zheng, Marcelo Tosatti, Marcin Gibuła,
	qemu-devel@nongnu.org

On Thu, Aug 21, 2014 at 8:44 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 21/08/2014 18:41, Andrey Korolyov ha scritto:
>> Sorry, the test series revealed that the problem is still here, but
>> with lower hit ratio with modified 2.1-HEAD using selected argument
>> set. The actual root of the issue is in '-cpu
>> qemu64,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000'
>> Windows-specific addition which can be traded for some idle CPU
>> consumption. Reproduction of bug is quite simple - fire up Windows VM,
>> migrate it two-three times and then try to log in using rdesktop/VNC -
>> if disk is frozen, login progress will freeze too. It is not so easy
>> to detect blocked I/O on Windows in the interactive session due to
>> lack of soft lockup warning equivalent, so one can try such a sequence
>> to check.
>
> What kernel version?
>
> The ioapic patches at http://thread.gmane.org/gmane.linux.kernel/1671045
> fixed exactly a Windows-specific hang in migration.  But it didn't require
> Hyper-V enlightenments.
>
> Paolo

I`m running 3.10, so patches are not here, will try 3.16 soon. Even if
problem will be fixed, it still will be specific for 2.1, earlier
releases working well and I`ll bisect at a time.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-21 17:51                                                               ` Andrey Korolyov
@ 2014-08-22 16:44                                                                 ` Andrey Korolyov
  2014-08-22 17:45                                                                   ` Marcelo Tosatti
  2014-08-22 17:55                                                                   ` Paolo Bonzini
  0 siblings, 2 replies; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-22 16:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Fam Zheng, Marcelo Tosatti, Marcin Gibuła,
	qemu-devel@nongnu.org

>
> I`m running 3.10, so patches are not here, will try 3.16 soon. Even if
> problem will be fixed, it still will be specific for 2.1, earlier
> releases working well and I`ll bisect at a time.

Thanks, using 3.16 helped indeed. Though the bug remains as is at 2.1
on LTS 3.10, should I find the breaking commit, or it is better to
backport kvm subsystem changes related to apic to 3.10?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-22 16:44                                                                 ` Andrey Korolyov
@ 2014-08-22 17:45                                                                   ` Marcelo Tosatti
  2014-08-22 18:39                                                                     ` Andrey Korolyov
  2014-08-22 17:55                                                                   ` Paolo Bonzini
  1 sibling, 1 reply; 76+ messages in thread
From: Marcelo Tosatti @ 2014-08-22 17:45 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, Marcin Gibuła,
	qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 599 bytes --]

On Fri, Aug 22, 2014 at 08:44:53PM +0400, Andrey Korolyov wrote:
> >
> > I`m running 3.10, so patches are not here, will try 3.16 soon. Even if
> > problem will be fixed, it still will be specific for 2.1, earlier
> > releases working well and I`ll bisect at a time.
> 
> Thanks, using 3.16 helped indeed. Though the bug remains as is at 2.1
> on LTS 3.10, should I find the breaking commit, or it is better to
> backport kvm subsystem changes related to apic to 3.10?

Andrey,

Can you try the patchset (which includes cpu_clean_all_dirty) please?
It should be equivalent to your proposed patch.



[-- Attachment #2: patches-kvmclock.tar.gz --]
[-- Type: application/gzip, Size: 2605 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-22 16:44                                                                 ` Andrey Korolyov
  2014-08-22 17:45                                                                   ` Marcelo Tosatti
@ 2014-08-22 17:55                                                                   ` Paolo Bonzini
  1 sibling, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2014-08-22 17:55 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Fam Zheng, Marcelo Tosatti, Marcin Gibuła,
	qemu-devel@nongnu.org

Il 22/08/2014 18:44, Andrey Korolyov ha scritto:
>>
>> I`m running 3.10, so patches are not here, will try 3.16 soon. Even if
>> problem will be fixed, it still will be specific for 2.1, earlier
>> releases working well and I`ll bisect at a time.
> 
> Thanks, using 3.16 helped indeed. Though the bug remains as is at 2.1
> on LTS 3.10, should I find the breaking commit, or it is better to
> backport kvm subsystem changes related to apic to 3.10?

I think I'd suggest the latter.  If the fix was brought by those ioapic
patches, the bug has always been latent as far as I could see.

I'm afraid that bisection would be pretty time consuming, also because
the bug didn't reproduce 100% for me (and at some point stopped
reproducing even on 3.10, causing some serious wtf...).

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-22 17:45                                                                   ` Marcelo Tosatti
@ 2014-08-22 18:39                                                                     ` Andrey Korolyov
  2014-08-22 19:05                                                                       ` Marcelo Tosatti
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-22 18:39 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, Marcin Gibuła,
	qemu-devel@nongnu.org

On Fri, Aug 22, 2014 at 9:45 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Aug 22, 2014 at 08:44:53PM +0400, Andrey Korolyov wrote:
>> >
>> > I`m running 3.10, so patches are not here, will try 3.16 soon. Even if
>> > problem will be fixed, it still will be specific for 2.1, earlier
>> > releases working well and I`ll bisect at a time.
>>
>> Thanks, using 3.16 helped indeed. Though the bug remains as is at 2.1
>> on LTS 3.10, should I find the breaking commit, or it is better to
>> backport kvm subsystem changes related to apic to 3.10?
>
> Andrey,
>
> Can you try the patchset (which includes cpu_clean_all_dirty) please?
> It should be equivalent to your proposed patch.
>
>


Just checked case for Windows, bug is still here (twelve continuous
ping-pong migrations between two hosts). Also there is very weird
observation that the if VM stalled, password window renders in a
cripple way just after connection when no VNC region refreshes are
made yet by mouse, vm-is-alive case rendering after-c-a-d prompt
properly. Probably it`ll be easier to check behaviour against Paolo`
patchset for RH kernel, though I`m still in progress with cutting out
the rest to adapt them on top of bare 3.10, for ones with access to
git log it should be easier :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-22 18:39                                                                     ` Andrey Korolyov
@ 2014-08-22 19:05                                                                       ` Marcelo Tosatti
  2014-08-22 19:05                                                                         ` Marcelo Tosatti
  0 siblings, 1 reply; 76+ messages in thread
From: Marcelo Tosatti @ 2014-08-22 19:05 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, Marcin Gibuła,
	qemu-devel@nongnu.org

On Fri, Aug 22, 2014 at 10:39:38PM +0400, Andrey Korolyov wrote:
> On Fri, Aug 22, 2014 at 9:45 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Fri, Aug 22, 2014 at 08:44:53PM +0400, Andrey Korolyov wrote:
> >> >
> >> > I`m running 3.10, so patches are not here, will try 3.16 soon. Even if
> >> > problem will be fixed, it still will be specific for 2.1, earlier
> >> > releases working well and I`ll bisect at a time.
> >>
> >> Thanks, using 3.16 helped indeed. Though the bug remains as is at 2.1
> >> on LTS 3.10, should I find the breaking commit, or it is better to
> >> backport kvm subsystem changes related to apic to 3.10?
> >
> > Andrey,
> >
> > Can you try the patchset (which includes cpu_clean_all_dirty) please?
> > It should be equivalent to your proposed patch.
> >
> >
> 
> 
> Just checked case for Windows, bug is still here (twelve continuous
> ping-pong migrations between two hosts). Also there is very weird
> observation that the if VM stalled, password window renders in a
> cripple way just after connection when no VNC region refreshes are
> made yet by mouse, vm-is-alive case rendering after-c-a-d prompt
> properly. Probably it`ll be easier to check behaviour against Paolo`
> patchset for RH kernel, though I`m still in progress with cutting out
> the rest to adapt them on top of bare 3.10, for ones with access to
> git log it should be easier :)

Argh i forgot cpu_sync_all_states. Will resend shortly.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-22 19:05                                                                       ` Marcelo Tosatti
@ 2014-08-22 19:05                                                                         ` Marcelo Tosatti
  2014-08-22 19:51                                                                           ` Andrey Korolyov
  2014-08-22 21:01                                                                           ` Marcelo Tosatti
  0 siblings, 2 replies; 76+ messages in thread
From: Marcelo Tosatti @ 2014-08-22 19:05 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, Marcin Gibuła,
	qemu-devel@nongnu.org

On Fri, Aug 22, 2014 at 04:05:07PM -0300, Marcelo Tosatti wrote:
> On Fri, Aug 22, 2014 at 10:39:38PM +0400, Andrey Korolyov wrote:
> > On Fri, Aug 22, 2014 at 9:45 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > On Fri, Aug 22, 2014 at 08:44:53PM +0400, Andrey Korolyov wrote:
> > >> >
> > >> > I`m running 3.10, so patches are not here, will try 3.16 soon. Even if
> > >> > problem will be fixed, it still will be specific for 2.1, earlier
> > >> > releases working well and I`ll bisect at a time.
> > >>
> > >> Thanks, using 3.16 helped indeed. Though the bug remains as is at 2.1
> > >> on LTS 3.10, should I find the breaking commit, or it is better to
> > >> backport kvm subsystem changes related to apic to 3.10?
> > >
> > > Andrey,
> > >
> > > Can you try the patchset (which includes cpu_clean_all_dirty) please?
> > > It should be equivalent to your proposed patch.
> > >
> > >
> > 
> > 
> > Just checked case for Windows, bug is still here (twelve continuous
> > ping-pong migrations between two hosts). Also there is very weird
> > observation that the if VM stalled, password window renders in a
> > cripple way just after connection when no VNC region refreshes are
> > made yet by mouse, vm-is-alive case rendering after-c-a-d prompt
> > properly. Probably it`ll be easier to check behaviour against Paolo`
> > patchset for RH kernel, though I`m still in progress with cutting out
> > the rest to adapt them on top of bare 3.10, for ones with access to
> > git log it should be easier :)
> 
> Argh i forgot cpu_sync_all_states. Will resend shortly.

Well, can you add cpu_synchronize_all_states() before 
the cpu_clean_all_dirty() call and retry?

Sorry.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-22 19:05                                                                         ` Marcelo Tosatti
@ 2014-08-22 19:51                                                                           ` Andrey Korolyov
  2014-08-22 21:01                                                                           ` Marcelo Tosatti
  1 sibling, 0 replies; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-22 19:51 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, Marcin Gibuła,
	qemu-devel@nongnu.org

On Fri, Aug 22, 2014 at 11:05 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Aug 22, 2014 at 04:05:07PM -0300, Marcelo Tosatti wrote:
>> On Fri, Aug 22, 2014 at 10:39:38PM +0400, Andrey Korolyov wrote:
>> > On Fri, Aug 22, 2014 at 9:45 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > > On Fri, Aug 22, 2014 at 08:44:53PM +0400, Andrey Korolyov wrote:
>> > >> >
>> > >> > I`m running 3.10, so patches are not here, will try 3.16 soon. Even if
>> > >> > problem will be fixed, it still will be specific for 2.1, earlier
>> > >> > releases working well and I`ll bisect at a time.
>> > >>
>> > >> Thanks, using 3.16 helped indeed. Though the bug remains as is at 2.1
>> > >> on LTS 3.10, should I find the breaking commit, or it is better to
>> > >> backport kvm subsystem changes related to apic to 3.10?
>> > >
>> > > Andrey,
>> > >
>> > > Can you try the patchset (which includes cpu_clean_all_dirty) please?
>> > > It should be equivalent to your proposed patch.
>> > >
>> > >
>> >
>> >
>> > Just checked case for Windows, bug is still here (twelve continuous
>> > ping-pong migrations between two hosts). Also there is very weird
>> > observation that the if VM stalled, password window renders in a
>> > cripple way just after connection when no VNC region refreshes are
>> > made yet by mouse, vm-is-alive case rendering after-c-a-d prompt
>> > properly. Probably it`ll be easier to check behaviour against Paolo`
>> > patchset for RH kernel, though I`m still in progress with cutting out
>> > the rest to adapt them on top of bare 3.10, for ones with access to
>> > git log it should be easier :)
>>
>> Argh i forgot cpu_sync_all_states. Will resend shortly.
>
> Well, can you add cpu_synchronize_all_states() before
> the cpu_clean_all_dirty() call and retry?
>
> Sorry.
>

Still nope, and libvirtd-receiver crashed eventually during test, lol.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-22 19:05                                                                         ` Marcelo Tosatti
  2014-08-22 19:51                                                                           ` Andrey Korolyov
@ 2014-08-22 21:01                                                                           ` Marcelo Tosatti
  2014-08-22 22:21                                                                             ` Andrey Korolyov
  2014-08-24 16:19                                                                             ` Andrey Korolyov
  1 sibling, 2 replies; 76+ messages in thread
From: Marcelo Tosatti @ 2014-08-22 21:01 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, Marcin Gibuła,
	qemu-devel@nongnu.org

On Fri, Aug 22, 2014 at 04:05:46PM -0300, Marcelo Tosatti wrote:
> On Fri, Aug 22, 2014 at 04:05:07PM -0300, Marcelo Tosatti wrote:
> > On Fri, Aug 22, 2014 at 10:39:38PM +0400, Andrey Korolyov wrote:
> > > On Fri, Aug 22, 2014 at 9:45 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > On Fri, Aug 22, 2014 at 08:44:53PM +0400, Andrey Korolyov wrote:
> > > >> >
> > > >> > I`m running 3.10, so patches are not here, will try 3.16 soon. Even if
> > > >> > problem will be fixed, it still will be specific for 2.1, earlier
> > > >> > releases working well and I`ll bisect at a time.
> > > >>
> > > >> Thanks, using 3.16 helped indeed. Though the bug remains as is at 2.1
> > > >> on LTS 3.10, should I find the breaking commit, or it is better to
> > > >> backport kvm subsystem changes related to apic to 3.10?
> > > >
> > > > Andrey,
> > > >
> > > > Can you try the patchset (which includes cpu_clean_all_dirty) please?
> > > > It should be equivalent to your proposed patch.
> > > >
> > > >
> > > 
> > > 
> > > Just checked case for Windows, bug is still here (twelve continuous
> > > ping-pong migrations between two hosts). Also there is very weird
> > > observation that the if VM stalled, password window renders in a
> > > cripple way just after connection when no VNC region refreshes are
> > > made yet by mouse, vm-is-alive case rendering after-c-a-d prompt
> > > properly. Probably it`ll be easier to check behaviour against Paolo`
> > > patchset for RH kernel, though I`m still in progress with cutting out
> > > the rest to adapt them on top of bare 3.10, for ones with access to
> > > git log it should be easier :)
> > 
> > Argh i forgot cpu_sync_all_states. Will resend shortly.
> 
> Well, can you add cpu_synchronize_all_states() before 
> the cpu_clean_all_dirty() call and retry?
> 
> Sorry.

Andrey,

Can you give instructions on how to reproduce please?

- qemu.git codebase (if you have any patches relative to a
given commit id, please provide the patches).
- qemu command line.
- how to recreate guest disk contents.
- how to recreate workload at which point migration 
fails.
- migration command relative to last item.


Thanks

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-22 21:01                                                                           ` Marcelo Tosatti
@ 2014-08-22 22:21                                                                             ` Andrey Korolyov
  2014-08-24 16:19                                                                             ` Andrey Korolyov
  1 sibling, 0 replies; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-22 22:21 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, Marcin Gibuła,
	qemu-devel@nongnu.org

>
> Andrey,
>
> Can you give instructions on how to reproduce please?
>

Please find answers inline:

> - qemu.git codebase (if you have any patches relative to a
> given commit id, please provide the patches).


rolled to bare 2.1-release to reproduce, for 3.10 I am hitting issue
with and without patches from previous message. With 3.16.0, I am not
able to reproduce issue anymore on bare 2.1. Both virtio-dp and
regular virtio-blk are affected, though first option hit the issue
always at a single migration check with HV timer, so I`d suggest to
test against it (as in mine args string below). With hvapic option set
emulator tends to hit issue more frequently than without it.

> - qemu command line.

qemu-system-x86_64 -enable-kvm -name vm29107 -S -machine
pc-i440fx-2.1,accel=kvm,usb=off -cpu
qemu64,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -bios
/usr/share/seabios/bios.bin-1.7.4 -m 512 -realtime mlock=off -smp
12,sockets=1,cores=12,threads=12 -numa node,nodeid=0,cpus=0-11,mem=512
-uuid 53646494-fe6c-4b5d-b6d0-c333b4f20582 -no-user-config -nodefaults
-device sga -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm29107.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global
PIIX4_PM.disable_s4=1 -boot strict=on -device
usb-ehci,id=usb,bus=pci.0,addr=0x5 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive
file=rbd:dev-rack2/vm29107-WfV:id=qemukvm:key=secret:auth_supported=cephx\;none:mon_host=10.6.0.1\:6789\;10.6.0.3\:6789\;10.6.0.4\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fds=26:27:28:29,id=hostnet0,vhost=on,vhostfds=30:31:32:33
-device virtio-net-pci,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:10:06:9a,bus=pci.0,addr=0x3
-netdev tap,fds=34:35:36:37,id=hostnet1,vhost=on,vhostfds=38:39:40:41
-device virtio-net-pci,mq=on,vectors=10,netdev=hostnet1,id=net1,mac=52:54:00:10:06:9b,bus=pci.0,addr=0x4
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/vm29107.sock,server,nowait
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1
-vnc 0.0.0.0:0 -k en-us -device VGA,id=video0,bus=pci.0,addr=0x2
-object iothread,id=vm29107blk0 -set
device.virtio-disk0.config-wce=off -set device.virtio-disk0.scsi=off
-set device.virtio-disk0.iothread=vm29107blk0 -m
512,slots=62,maxmem=16384M -object
memory-backend-ram,id=mem0,size=256M -device
pc-dimm,id=dimm0,node=0,memdev=mem0 -object
memory-backend-ram,id=mem1,size=256M -device
pc-dimm,id=dimm1,node=0,memdev=mem1 -object
memory-backend-ram,id=mem2,size=256M -device
pc-dimm,id=dimm2,node=0,memdev=mem2 -object
memory-backend-ram,id=mem3,size=256M -device
pc-dimm,id=dimm3,node=0,memdev=mem3 -object
memory-backend-ram,id=mem4,size=256M -device
pc-dimm,id=dimm4,node=0,memdev=mem4 -object
memory-backend-ram,id=mem5,size=256M -device
pc-dimm,id=dimm5,node=0,memdev=mem5 -object
memory-backend-ram,id=mem6,size=256M -device
pc-dimm,id=dimm6,node=0,memdev=mem6 -object
memory-backend-ram,id=mem7,size=256M -device
pc-dimm,id=dimm7,node=0,memdev=mem7 -object
memory-backend-ram,id=mem8,size=256M -device
pc-dimm,id=dimm8,node=0,memdev=mem8 -object
memory-backend-ram,id=mem9,size=256M -device
pc-dimm,id=dimm9,node=0,memdev=mem9 -object
memory-backend-ram,id=mem10,size=256M -device
pc-dimm,id=dimm10,node=0,memdev=mem10 -object
memory-backend-ram,id=mem11,size=256M -device
pc-dimm,id=dimm11,node=0,memdev=mem11 -object
memory-backend-ram,id=mem12,size=256M -device
pc-dimm,id=dimm12,node=0,memdev=mem12 -object
memory-backend-ram,id=mem13,size=256M -device
pc-dimm,id=dimm13,node=0,memdev=mem13

> - how to recreate guest disk contents.

In my case, it is just bare installation of W2008R2 x64, I can share
it off-list if necessary.

> - how to recreate workload at which point migration
> fails.

Migration does not fail itself, the VM` disk seemingly is.
Reproduction is quite simple - just boot up a VM, migrate it N times
and try to log in. Failed case will hang on the progress screen (or
you may log in before and check the disk availability by any other
convenient method).

> - migration command relative to last item.

It is p2p libvirt live migration:

for i in $(seq 1 6) ; do virsh migrate --live --persistent
--undefinesource vm29107 qemu+tcp://twin2/system ; sleep 5; ssh twin2
"virsh migrate --live --persistent --undefinesource vm29107
qemu+tcp://twin0/system" ; done

>
>
> Thanks

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-22 21:01                                                                           ` Marcelo Tosatti
  2014-08-22 22:21                                                                             ` Andrey Korolyov
@ 2014-08-24 16:19                                                                             ` Andrey Korolyov
  2014-08-24 16:35                                                                               ` Paolo Bonzini
  1 sibling, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-24 16:19 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Fam Zheng, Marcin Gibuła,
	qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 367 bytes --]

Sorry, I was a bit inaccurate in my thoughts at Fri about necessary
amount of work, patch lays perfectly on 3.10 with bit of monkey
rewrites. The attached one fixed problem for me - it represents
0b10a1c87a2b0fb459baaefba9cb163dbb8d3344,
0bc830b05c667218d703f2026ec866c49df974fc,
44847dea79751e95665a439f8c63a65e51da8e1f and
673f7b4257a1fe7b181e1a1182ecc2b6b2b795f1.

[-- Attachment #2: pbonzini-ioapic-3.10.patch --]
[-- Type: text/x-diff, Size: 6818 bytes --]

diff -ru linux-3.10.11/arch/ia64/kvm/kvm-ia64.c linux-3.10.11.patched-ioapic/arch/ia64/kvm/kvm-ia64.c
--- linux-3.10.11/arch/ia64/kvm/kvm-ia64.c	2013-09-08 09:10:14.000000000 +0400
+++ linux-3.10.11.patched-ioapic/arch/ia64/kvm/kvm-ia64.c	2014-08-24 19:49:25.723072383 +0400
@@ -199,6 +199,7 @@
 	case KVM_CAP_IRQCHIP:
 	case KVM_CAP_MP_STATE:
 	case KVM_CAP_IRQ_INJECT_STATUS:
+	case KVM_CAP_IOAPIC_POLARITY_IGNORED:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff -ru linux-3.10.11/arch/x86/kvm/x86.c linux-3.10.11.patched-ioapic/arch/x86/kvm/x86.c
--- linux-3.10.11/arch/x86/kvm/x86.c	2013-09-08 09:10:14.000000000 +0400
+++ linux-3.10.11.patched-ioapic/arch/x86/kvm/x86.c	2014-08-24 19:50:06.553716276 +0400
@@ -2537,6 +2537,7 @@
 	case KVM_CAP_GET_TSC_KHZ:
 	case KVM_CAP_KVMCLOCK_CTRL:
 	case KVM_CAP_READONLY_MEM:
+	case KVM_CAP_IOAPIC_POLARITY_IGNORED:
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 	case KVM_CAP_ASSIGN_DEV_IRQ:
 	case KVM_CAP_PCI_2_3:
diff -ru linux-3.10.11/include/uapi/linux/kvm.h linux-3.10.11.patched-ioapic/include/uapi/linux/kvm.h
--- linux-3.10.11/include/uapi/linux/kvm.h	2013-09-08 09:10:14.000000000 +0400
+++ linux-3.10.11.patched-ioapic/include/uapi/linux/kvm.h	2014-08-24 19:51:10.975577204 +0400
@@ -666,6 +666,7 @@
 #define KVM_CAP_IRQ_MPIC 90
 #define KVM_CAP_PPC_RTAS 91
 #define KVM_CAP_IRQ_XICS 92
+#define KVM_CAP_IOAPIC_POLARITY_IGNORED 93
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff -ru linux-3.10.11/virt/kvm/ioapic.c linux-3.10.11.patched-ioapic/virt/kvm/ioapic.c
--- linux-3.10.11/virt/kvm/ioapic.c	2013-09-08 09:10:14.000000000 +0400
+++ linux-3.10.11.patched-ioapic/virt/kvm/ioapic.c	2014-08-24 19:59:26.755137527 +0400
@@ -50,7 +50,7 @@
 #else
 #define ioapic_debug(fmt, arg...)
 #endif
-static int ioapic_deliver(struct kvm_ioapic *vioapic, int irq,
+static int ioapic_service(struct kvm_ioapic *vioapic, int irq
 		bool line_status);
 
 static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
@@ -163,23 +163,67 @@
 	return false;
 }
 
-static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx,
-		bool line_status)
+static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
+		int irq_level, bool line_status)
 {
-	union kvm_ioapic_redirect_entry *pent;
-	int injected = -1;
+	union kvm_ioapic_redirect_entry entry;
+	u32 mask = 1 << irq;
+	u32 old_irr;
+	int edge, ret;
 
-	pent = &ioapic->redirtbl[idx];
+	entry = ioapic->redirtbl[irq];
+	edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
 
-	if (!pent->fields.mask) {
-		injected = ioapic_deliver(ioapic, idx, line_status);
-		if (injected && pent->fields.trig_mode == IOAPIC_LEVEL_TRIG)
-			pent->fields.remote_irr = 1;
+	if (!irq_level) {
+		ioapic->irr &= ~mask;
+		ret = 1;
+		goto out;
+	}
+
+	/*
+	 * Return 0 for coalesced interrupts; for edge-triggered interrupts,
+	 * this only happens if a previous edge has not been delivered due
+	 * do masking.  For level interrupts, the remote_irr field tells
+	 * us if the interrupt is waiting for an EOI.
+	 *
+	 * RTC is special: it is edge-triggered, but userspace likes to know
+	 * if it has been already ack-ed via EOI because coalesced RTC
+	 * interrupts lead to time drift in Windows guests.  So we track
+	 * EOI manually for the RTC interrupt.
+	 */
+	if (irq == RTC_GSI && line_status &&
+		rtc_irq_check_coalesced(ioapic)) {
+		ret = 0;
+		goto out;
 	}
 
-	return injected;
+	old_irr = ioapic->irr;
+	ioapic->irr |= mask;
+	if ((edge && old_irr == ioapic->irr) ||
+	    (!edge && entry.fields.remote_irr)) {
+		ret = 0;
+		goto out;
+	}
+
+	ret = ioapic_service(ioapic, irq, line_status);
+
+out:
+	trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
+	return ret;
+}
+
+static void kvm_ioapic_inject_all(struct kvm_ioapic *ioapic, unsigned long irr)
+{
+	u32 idx;
+
+	rtc_irq_eoi_tracking_reset(ioapic);
+	for_each_set_bit(idx, &irr, IOAPIC_NUM_PINS)
+		ioapic_set_irq(ioapic, idx, 1, true);
+
+	kvm_rtc_eoi_tracking_restore_all(ioapic);
 }
 
+
 static void update_handled_vectors(struct kvm_ioapic *ioapic)
 {
 	DECLARE_BITMAP(handled_vectors, 256);
@@ -282,12 +326,15 @@
 	}
 }
 
-static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq, bool line_status)
+static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
 {
 	union kvm_ioapic_redirect_entry *entry = &ioapic->redirtbl[irq];
 	struct kvm_lapic_irq irqe;
 	int ret;
 
+	if (entry->fields.mask)
+	    return -1;
+
 	ioapic_debug("dest=%x dest_mode=%x delivery_mode=%x "
 		     "vector=%x trig_mode=%x\n",
 		     entry->fields.dest_id, entry->fields.dest_mode,
@@ -302,6 +349,10 @@
 	irqe.level = 1;
 	irqe.shorthand = 0;
 
+	if (irqe.trig_mode == IOAPIC_EDGE_TRIG)
+	    ioapic->irr &= ~(1 << irq);
+
+
 	if (irq == RTC_GSI && line_status) {
 		BUG_ON(ioapic->rtc_status.pending_eoi != 0);
 		ret = kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe,
@@ -309,6 +360,8 @@
 		ioapic->rtc_status.pending_eoi = ret;
 	} else
 		ret = kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe, NULL);
+	if (ret && irqe.trig_mode == IOAPIC_LEVEL_TRIG)
+	    entry->fields.remote_irr = 1;
 
 	return ret;
 }
@@ -316,39 +369,15 @@
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
 		       int level, bool line_status)
 {
-	u32 old_irr;
-	u32 mask = 1 << irq;
-	union kvm_ioapic_redirect_entry entry;
 	int ret, irq_level;
 
 	BUG_ON(irq < 0 || irq >= IOAPIC_NUM_PINS);
 
 	spin_lock(&ioapic->lock);
-	old_irr = ioapic->irr;
 	irq_level = __kvm_irq_line_state(&ioapic->irq_states[irq],
 					 irq_source_id, level);
-	entry = ioapic->redirtbl[irq];
-	irq_level ^= entry.fields.polarity;
-	if (!irq_level) {
-		ioapic->irr &= ~mask;
-		ret = 1;
-	} else {
-		int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
+	ret = ioapic_set_irq(ioapic, irq, irq_level, line_status);
 
-		if (irq == RTC_GSI && line_status &&
-			rtc_irq_check_coalesced(ioapic)) {
-			ret = 0; /* coalesced */
-			goto out;
-		}
-		ioapic->irr |= mask;
-		if ((edge && old_irr != ioapic->irr) ||
-		    (!edge && !entry.fields.remote_irr))
-			ret = ioapic_service(ioapic, irq, line_status);
-		else
-			ret = 0; /* report coalesced interrupt */
-	}
-out:
-	trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
 	spin_unlock(&ioapic->lock);
 
 	return ret;
@@ -394,7 +423,7 @@
 
 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
 		ent->fields.remote_irr = 0;
-		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
+		if (ioapic->irr & (1 << i))
 			ioapic_service(ioapic, i, false);
 	}
 }
@@ -595,9 +624,10 @@
 
 	spin_lock(&ioapic->lock);
 	memcpy(ioapic, state, sizeof(struct kvm_ioapic_state));
+	ioapic->irr = 0;
 	update_handled_vectors(ioapic);
 	kvm_vcpu_request_scan_ioapic(kvm);
-	kvm_rtc_eoi_tracking_restore_all(ioapic);
+	kvm_ioapic_inject_all(ioapic, state->irr);
 	spin_unlock(&ioapic->lock);
 	return 0;
 }

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-24 16:19                                                                             ` Andrey Korolyov
@ 2014-08-24 16:35                                                                               ` Paolo Bonzini
  2014-08-24 16:57                                                                                 ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-08-24 16:35 UTC (permalink / raw)
  To: Andrey Korolyov, Marcelo Tosatti
  Cc: Amit Shah, Marcin Gibuła, Fam Zheng, qemu-devel@nongnu.org

Il 24/08/2014 18:19, Andrey Korolyov ha scritto:
> Sorry, I was a bit inaccurate in my thoughts at Fri about necessary
> amount of work, patch lays perfectly on 3.10 with bit of monkey
> rewrites. The attached one fixed problem for me - it represents
> 0b10a1c87a2b0fb459baaefba9cb163dbb8d3344,
> 0bc830b05c667218d703f2026ec866c49df974fc,
> 44847dea79751e95665a439f8c63a65e51da8e1f and
> 673f7b4257a1fe7b181e1a1182ecc2b6b2b795f1.

So, with these changes, Marcelo's patch does not hang up your guest anymore?

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-24 16:35                                                                               ` Paolo Bonzini
@ 2014-08-24 16:57                                                                                 ` Andrey Korolyov
  2014-08-24 18:51                                                                                   ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-24 16:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Marcin Gibuła, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

On Sun, Aug 24, 2014 at 8:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 24/08/2014 18:19, Andrey Korolyov ha scritto:
>> Sorry, I was a bit inaccurate in my thoughts at Fri about necessary
>> amount of work, patch lays perfectly on 3.10 with bit of monkey
>> rewrites. The attached one fixed problem for me - it represents
>> 0b10a1c87a2b0fb459baaefba9cb163dbb8d3344,
>> 0bc830b05c667218d703f2026ec866c49df974fc,
>> 44847dea79751e95665a439f8c63a65e51da8e1f and
>> 673f7b4257a1fe7b181e1a1182ecc2b6b2b795f1.
>
> So, with these changes, Marcelo's patch does not hang up your guest anymore?
>
> Paolo
>

If I may reword, Marcelo`s proposed states sync with revert-revert of
agraf`s patch, does not break anything for Windows (migration works
well for any variant of emulator with modified kernel modules). Let me
check if initially reported issue (lost I/O interrupts) is gone for
the current situation (patched kernel plus Marcelo` patch).

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-24 16:57                                                                                 ` Andrey Korolyov
@ 2014-08-24 18:51                                                                                   ` Andrey Korolyov
  2014-08-24 20:14                                                                                     ` Andrey Korolyov
  2014-09-04 16:38                                                                                     ` Marcelo Tosatti
  0 siblings, 2 replies; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-24 18:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Marcin Gibuła, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

On Sun, Aug 24, 2014 at 8:57 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Sun, Aug 24, 2014 at 8:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 24/08/2014 18:19, Andrey Korolyov ha scritto:
>>> Sorry, I was a bit inaccurate in my thoughts at Fri about necessary
>>> amount of work, patch lays perfectly on 3.10 with bit of monkey
>>> rewrites. The attached one fixed problem for me - it represents
>>> 0b10a1c87a2b0fb459baaefba9cb163dbb8d3344,
>>> 0bc830b05c667218d703f2026ec866c49df974fc,
>>> 44847dea79751e95665a439f8c63a65e51da8e1f and
>>> 673f7b4257a1fe7b181e1a1182ecc2b6b2b795f1.
>>
>> So, with these changes, Marcelo's patch does not hang up your guest anymore?
>>
>> Paolo
>>
>
> If I may reword, Marcelo`s proposed states sync with revert-revert of
> agraf`s patch, does not break anything for Windows (migration works
> well for any variant of emulator with modified kernel modules). Let me
> check if initially reported issue (lost I/O interrupts) is gone for
> the current situation (patched kernel plus Marcelo` patch).


patched kernel + any 2.1 variant + Windows = works
patched kernel + patched 2.1 + Linux + disk workload = works fine
bare kernel + any 2.1 variant + Windows = disk stale
bare kernel + proposed patch from Marcelo + Linux + disk workload =
works fine (bare kernel + 2.1 release works at this point as tested
earlier)


Also guest 3.10.52 caused literally a rain of WTFs crashing emulator
with an emulation error in every case tested with live migration...
3.10.11 works fine. There will be another report soon I suppose though
it is nearly impossible to do a proper bisection in a sane time.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-24 18:51                                                                                   ` Andrey Korolyov
@ 2014-08-24 20:14                                                                                     ` Andrey Korolyov
  2014-08-25 10:45                                                                                       ` Paolo Bonzini
  2014-09-04 16:38                                                                                     ` Marcelo Tosatti
  1 sibling, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-24 20:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Marcin Gibuła, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

Forgot to mention, _actual_ patch from above. Adding
cpu_synchronize_all_states() bringing old bug with lost interrupts
back.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-24 20:14                                                                                     ` Andrey Korolyov
@ 2014-08-25 10:45                                                                                       ` Paolo Bonzini
  2014-08-25 10:51                                                                                         ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2014-08-25 10:45 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Marcin Gibuła, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

Il 24/08/2014 22:14, Andrey Korolyov ha scritto:
> Forgot to mention, _actual_ patch from above. Adding
> cpu_synchronize_all_states() bringing old bug with lost interrupts
> back.

Are you adding it before or after cpu_clean_all_dirty?

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-25 10:45                                                                                       ` Paolo Bonzini
@ 2014-08-25 10:51                                                                                         ` Andrey Korolyov
  0 siblings, 0 replies; 76+ messages in thread
From: Andrey Korolyov @ 2014-08-25 10:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amit Shah, Marcin Gibuła, Marcelo Tosatti, Fam Zheng,
	qemu-devel@nongnu.org

On Mon, Aug 25, 2014 at 2:45 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 24/08/2014 22:14, Andrey Korolyov ha scritto:
>> Forgot to mention, _actual_ patch from above. Adding
>> cpu_synchronize_all_states() bringing old bug with lost interrupts
>> back.
>
> Are you adding it before or after cpu_clean_all_dirty?
>
> Paolo

I`ve added right before, as suggested.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-24 18:51                                                                                   ` Andrey Korolyov
  2014-08-24 20:14                                                                                     ` Andrey Korolyov
@ 2014-09-04 16:38                                                                                     ` Marcelo Tosatti
  2014-09-04 16:52                                                                                       ` Andrey Korolyov
  1 sibling, 1 reply; 76+ messages in thread
From: Marcelo Tosatti @ 2014-09-04 16:38 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Marcin Gibuła, Fam Zheng,
	qemu-devel@nongnu.org

On Sun, Aug 24, 2014 at 10:51:38PM +0400, Andrey Korolyov wrote:
> On Sun, Aug 24, 2014 at 8:57 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> > On Sun, Aug 24, 2014 at 8:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> Il 24/08/2014 18:19, Andrey Korolyov ha scritto:
> >>> Sorry, I was a bit inaccurate in my thoughts at Fri about necessary
> >>> amount of work, patch lays perfectly on 3.10 with bit of monkey
> >>> rewrites. The attached one fixed problem for me - it represents
> >>> 0b10a1c87a2b0fb459baaefba9cb163dbb8d3344,
> >>> 0bc830b05c667218d703f2026ec866c49df974fc,
> >>> 44847dea79751e95665a439f8c63a65e51da8e1f and
> >>> 673f7b4257a1fe7b181e1a1182ecc2b6b2b795f1.
> >>
> >> So, with these changes, Marcelo's patch does not hang up your guest anymore?
> >>
> >> Paolo
> >>
> >
> > If I may reword, Marcelo`s proposed states sync with revert-revert of
> > agraf`s patch, does not break anything for Windows (migration works
> > well for any variant of emulator with modified kernel modules). Let me
> > check if initially reported issue (lost I/O interrupts) is gone for
> > the current situation (patched kernel plus Marcelo` patch).
> 
> 
> patched kernel + any 2.1 variant + Windows = works
> patched kernel + patched 2.1 + Linux + disk workload = works fine

Andrey,

What patch is this again ?

> bare kernel + any 2.1 variant + Windows = disk stale
> bare kernel + proposed patch from Marcelo + Linux + disk workload =
> works fine (bare kernel + 2.1 release works at this point as tested
> earlier)
> 
> Also guest 3.10.52 caused literally a rain of WTFs crashing emulator
> with an emulation error in every case tested with live migration...
> 3.10.11 works fine. There will be another report soon I suppose though
> it is nearly impossible to do a proper bisection in a sane time.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-09-04 16:38                                                                                     ` Marcelo Tosatti
@ 2014-09-04 16:52                                                                                       ` Andrey Korolyov
  2014-09-04 18:54                                                                                         ` Marcelo Tosatti
  0 siblings, 1 reply; 76+ messages in thread
From: Andrey Korolyov @ 2014-09-04 16:52 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Marcin Gibuła, Fam Zheng,
	qemu-devel@nongnu.org

On Thu, Sep 4, 2014 at 8:38 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Sun, Aug 24, 2014 at 10:51:38PM +0400, Andrey Korolyov wrote:
>> On Sun, Aug 24, 2014 at 8:57 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> > On Sun, Aug 24, 2014 at 8:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> >> Il 24/08/2014 18:19, Andrey Korolyov ha scritto:
>> >>> Sorry, I was a bit inaccurate in my thoughts at Fri about necessary
>> >>> amount of work, patch lays perfectly on 3.10 with bit of monkey
>> >>> rewrites. The attached one fixed problem for me - it represents
>> >>> 0b10a1c87a2b0fb459baaefba9cb163dbb8d3344,
>> >>> 0bc830b05c667218d703f2026ec866c49df974fc,
>> >>> 44847dea79751e95665a439f8c63a65e51da8e1f and
>> >>> 673f7b4257a1fe7b181e1a1182ecc2b6b2b795f1.
>> >>
>> >> So, with these changes, Marcelo's patch does not hang up your guest anymore?
>> >>
>> >> Paolo
>> >>
>> >
>> > If I may reword, Marcelo`s proposed states sync with revert-revert of
>> > agraf`s patch, does not break anything for Windows (migration works
>> > well for any variant of emulator with modified kernel modules). Let me
>> > check if initially reported issue (lost I/O interrupts) is gone for
>> > the current situation (patched kernel plus Marcelo` patch).
>>
>>
>> patched kernel + any 2.1 variant + Windows = works
>> patched kernel + patched 2.1 + Linux + disk workload = works fine
>
> Andrey,
>
> What patch is this again ?
>
>

It is attached to the message which has a highest quoting order above
1), basically it is a monkey adaptation of newer ioapic patches for
3.10. RH kernel includes them, so Centos7/RH7 users should not observe
reported problem.

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg252526.html

Using the opportunity - what are plans, if there are any, to revive
agraf`s patch which was the reason of an initial issue in this thread?
The Windows issue is most probably unrelated to it and I`d be happy to
check patch proposals because there are a lot of suffering with VMs
with large uptime and live migration when negative time shift appears.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-09-04 16:52                                                                                       ` Andrey Korolyov
@ 2014-09-04 18:54                                                                                         ` Marcelo Tosatti
  2014-09-04 18:54                                                                                           ` Marcelo Tosatti
  0 siblings, 1 reply; 76+ messages in thread
From: Marcelo Tosatti @ 2014-09-04 18:54 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Marcin Gibuła, Fam Zheng,
	qemu-devel@nongnu.org

On Thu, Sep 04, 2014 at 08:52:00PM +0400, Andrey Korolyov wrote:
> On Thu, Sep 4, 2014 at 8:38 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Sun, Aug 24, 2014 at 10:51:38PM +0400, Andrey Korolyov wrote:
> >> On Sun, Aug 24, 2014 at 8:57 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> >> > On Sun, Aug 24, 2014 at 8:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> >> Il 24/08/2014 18:19, Andrey Korolyov ha scritto:
> >> >>> Sorry, I was a bit inaccurate in my thoughts at Fri about necessary
> >> >>> amount of work, patch lays perfectly on 3.10 with bit of monkey
> >> >>> rewrites. The attached one fixed problem for me - it represents
> >> >>> 0b10a1c87a2b0fb459baaefba9cb163dbb8d3344,
> >> >>> 0bc830b05c667218d703f2026ec866c49df974fc,
> >> >>> 44847dea79751e95665a439f8c63a65e51da8e1f and
> >> >>> 673f7b4257a1fe7b181e1a1182ecc2b6b2b795f1.
> >> >>
> >> >> So, with these changes, Marcelo's patch does not hang up your guest anymore?
> >> >>
> >> >> Paolo
> >> >>
> >> >
> >> > If I may reword, Marcelo`s proposed states sync with revert-revert of
> >> > agraf`s patch, does not break anything for Windows (migration works
> >> > well for any variant of emulator with modified kernel modules). Let me
> >> > check if initially reported issue (lost I/O interrupts) is gone for
> >> > the current situation (patched kernel plus Marcelo` patch).
> >>
> >>
> >> patched kernel + any 2.1 variant + Windows = works
> >> patched kernel + patched 2.1 + Linux + disk workload = works fine
> >
> > Andrey,
> >
> > What patch is this again ?
> >
> >
> 
> It is attached to the message which has a highest quoting order above
> 1), basically it is a monkey adaptation of newer ioapic patches for
> 3.10. RH kernel includes them, so Centos7/RH7 users should not observe
> reported problem.
> 
> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg252526.html
> 
> Using the opportunity - what are plans, if there are any, to revive
> agraf`s patch which was the reason of an initial issue in this thread?
> The Windows issue is most probably unrelated to it and I`d be happy to
> check patch proposals because there are a lot of suffering with VMs
> with large uptime and live migration when negative time shift appears.

I'll try to confirm via Marcin's testcase (vfio+VM migration on Linux guests), 
that the proposed patch is fixing the issue.

Getting a BSOD on Win2008-R2-x64 while trying to install virtio-win
driver (viostor related BSOD).

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-09-04 18:54                                                                                         ` Marcelo Tosatti
@ 2014-09-04 18:54                                                                                           ` Marcelo Tosatti
  2014-09-04 19:13                                                                                             ` Andrey Korolyov
  0 siblings, 1 reply; 76+ messages in thread
From: Marcelo Tosatti @ 2014-09-04 18:54 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Amit Shah, Paolo Bonzini, Marcin Gibuła, Fam Zheng,
	qemu-devel@nongnu.org

On Thu, Sep 04, 2014 at 03:54:01PM -0300, Marcelo Tosatti wrote:
> On Thu, Sep 04, 2014 at 08:52:00PM +0400, Andrey Korolyov wrote:
> > On Thu, Sep 4, 2014 at 8:38 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > On Sun, Aug 24, 2014 at 10:51:38PM +0400, Andrey Korolyov wrote:
> > >> On Sun, Aug 24, 2014 at 8:57 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> > >> > On Sun, Aug 24, 2014 at 8:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > >> >> Il 24/08/2014 18:19, Andrey Korolyov ha scritto:
> > >> >>> Sorry, I was a bit inaccurate in my thoughts at Fri about necessary
> > >> >>> amount of work, patch lays perfectly on 3.10 with bit of monkey
> > >> >>> rewrites. The attached one fixed problem for me - it represents
> > >> >>> 0b10a1c87a2b0fb459baaefba9cb163dbb8d3344,
> > >> >>> 0bc830b05c667218d703f2026ec866c49df974fc,
> > >> >>> 44847dea79751e95665a439f8c63a65e51da8e1f and
> > >> >>> 673f7b4257a1fe7b181e1a1182ecc2b6b2b795f1.
> > >> >>
> > >> >> So, with these changes, Marcelo's patch does not hang up your guest anymore?
> > >> >>
> > >> >> Paolo
> > >> >>
> > >> >
> > >> > If I may reword, Marcelo`s proposed states sync with revert-revert of
> > >> > agraf`s patch, does not break anything for Windows (migration works
> > >> > well for any variant of emulator with modified kernel modules). Let me
> > >> > check if initially reported issue (lost I/O interrupts) is gone for
> > >> > the current situation (patched kernel plus Marcelo` patch).
> > >>
> > >>
> > >> patched kernel + any 2.1 variant + Windows = works
> > >> patched kernel + patched 2.1 + Linux + disk workload = works fine
> > >
> > > Andrey,
> > >
> > > What patch is this again ?
> > >
> > >
> > 
> > It is attached to the message which has a highest quoting order above
> > 1), basically it is a monkey adaptation of newer ioapic patches for
> > 3.10. RH kernel includes them, so Centos7/RH7 users should not observe
> > reported problem.
> > 
> > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg252526.html
> > 
> > Using the opportunity - what are plans, if there are any, to revive
> > agraf`s patch which was the reason of an initial issue in this thread?
> > The Windows issue is most probably unrelated to it and I`d be happy to
> > check patch proposals because there are a lot of suffering with VMs
> > with large uptime and live migration when negative time shift appears.
> 
> I'll try to confirm via Marcin's testcase (vfio+VM migration on Linux guests), 
> that the proposed patch is fixing the issue.
> 
> Getting a BSOD on Win2008-R2-x64 while trying to install virtio-win
> driver (viostor related BSOD).

What version of virtio-win are you using?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-09-04 18:54                                                                                           ` Marcelo Tosatti
@ 2014-09-04 19:13                                                                                             ` Andrey Korolyov
  0 siblings, 0 replies; 76+ messages in thread
From: Andrey Korolyov @ 2014-09-04 19:13 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Amit Shah, Paolo Bonzini, Marcin Gibuła, Fam Zheng,
	qemu-devel@nongnu.org

On Thu, Sep 4, 2014 at 10:54 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Sep 04, 2014 at 03:54:01PM -0300, Marcelo Tosatti wrote:
>> On Thu, Sep 04, 2014 at 08:52:00PM +0400, Andrey Korolyov wrote:
>> > On Thu, Sep 4, 2014 at 8:38 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > > On Sun, Aug 24, 2014 at 10:51:38PM +0400, Andrey Korolyov wrote:
>> > >> On Sun, Aug 24, 2014 at 8:57 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> > >> > On Sun, Aug 24, 2014 at 8:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> > >> >> Il 24/08/2014 18:19, Andrey Korolyov ha scritto:
>> > >> >>> Sorry, I was a bit inaccurate in my thoughts at Fri about necessary
>> > >> >>> amount of work, patch lays perfectly on 3.10 with bit of monkey
>> > >> >>> rewrites. The attached one fixed problem for me - it represents
>> > >> >>> 0b10a1c87a2b0fb459baaefba9cb163dbb8d3344,
>> > >> >>> 0bc830b05c667218d703f2026ec866c49df974fc,
>> > >> >>> 44847dea79751e95665a439f8c63a65e51da8e1f and
>> > >> >>> 673f7b4257a1fe7b181e1a1182ecc2b6b2b795f1.
>> > >> >>
>> > >> >> So, with these changes, Marcelo's patch does not hang up your guest anymore?
>> > >> >>
>> > >> >> Paolo
>> > >> >>
>> > >> >
>> > >> > If I may reword, Marcelo`s proposed states sync with revert-revert of
>> > >> > agraf`s patch, does not break anything for Windows (migration works
>> > >> > well for any variant of emulator with modified kernel modules). Let me
>> > >> > check if initially reported issue (lost I/O interrupts) is gone for
>> > >> > the current situation (patched kernel plus Marcelo` patch).
>> > >>
>> > >>
>> > >> patched kernel + any 2.1 variant + Windows = works
>> > >> patched kernel + patched 2.1 + Linux + disk workload = works fine
>> > >
>> > > Andrey,
>> > >
>> > > What patch is this again ?
>> > >
>> > >
>> >
>> > It is attached to the message which has a highest quoting order above
>> > 1), basically it is a monkey adaptation of newer ioapic patches for
>> > 3.10. RH kernel includes them, so Centos7/RH7 users should not observe
>> > reported problem.
>> >
>> > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg252526.html
>> >
>> > Using the opportunity - what are plans, if there are any, to revive
>> > agraf`s patch which was the reason of an initial issue in this thread?
>> > The Windows issue is most probably unrelated to it and I`d be happy to
>> > check patch proposals because there are a lot of suffering with VMs
>> > with large uptime and live migration when negative time shift appears.
>>
>> I'll try to confirm via Marcin's testcase (vfio+VM migration on Linux guests),
>> that the proposed patch is fixing the issue.
>>
>> Getting a BSOD on Win2008-R2-x64 while trying to install virtio-win
>> driver (viostor related BSOD).
>
> What version of virtio-win are you using?
>

Bug appears on both 0.1.76 and 0.1.81 (and probably it is at least
specific to virtio, if not to more generic nature). BSOD sounds
strange for me, are you using virtio drivers at the install stage or
you trying to change the disk model after installation? I`ve never
went second way, so can`t tell if BSOD is here or not.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-08-04 16:30                                                 ` Marcin Gibuła
  2014-08-04 18:30                                                   ` Paolo Bonzini
@ 2014-10-09 19:07                                                   ` Eduardo Habkost
  2014-10-10  7:33                                                     ` Marcin Gibuła
  1 sibling, 1 reply; 76+ messages in thread
From: Eduardo Habkost @ 2014-10-09 19:07 UTC (permalink / raw)
  To: Marcin Gibuła
  Cc: Andrey Korolyov, Fam Zheng, Marcelo Tosatti,
	qemu-devel@nongnu.org, Amit Shah, Paolo Bonzini

On Mon, Aug 04, 2014 at 06:30:09PM +0200, Marcin Gibuła wrote:
> W dniu 2014-07-31 13:27, Marcin Gibuła pisze:
> >>>Can you dump *env before and after the call to kvm_arch_get_registers?
> >>
> >>Yes, but it seems they are equal - I used memcmp() to compare them. Is
> >>there any other side effect that cpu_synchronize_all_states() may have?
> >
> >I think I found it.
> >
> >The reason for hang is, because when second call to
> >kvm_arch_get_registers() is skipped, it also skips kvm_get_apic() which
> >updates cpu->apic_state.
> 
> Paolo,
> 
> is this analysis deep enough for you? I don't know if that can be fixed with
> existing api as cpu_synchronize_all_states() is all or nothing kind of
> stuff.
> 
> Kvmclock needs it only to read current cpu registers, so syncing everything
> is not really necessary. Perhaps exporting one of kvm_arch_get_* would be
> enough. And it wouldn't mess with lazy get/put.
> 
> On the other hand, if in future any other driver adds
> cpu_synchronize_all_states() in its change state callback it could result in
> same error so perhaps more generic approach is needed.

Does anybody know why the APIC state loaded by the first call to
kvm_arch_get_registers() is wrong, in the first place? What exactly is
different in the APIC state in the second kvm_arch_get_registers() call,
and when/why does it change?

If cpu_synchronize_state() does the wrong thing if it is called at the
wrong moment, then we may have other hidden bugs, because the user can
trigger cpu_synchronize_all_states() calls arbitrarily using monitor
commands.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-10-09 19:07                                                   ` Eduardo Habkost
@ 2014-10-10  7:33                                                     ` Marcin Gibuła
  2014-10-11 12:58                                                       ` Eduardo Habkost
  0 siblings, 1 reply; 76+ messages in thread
From: Marcin Gibuła @ 2014-10-10  7:33 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Andrey Korolyov, Fam Zheng, Marcelo Tosatti,
	qemu-devel@nongnu.org, Amit Shah, Paolo Bonzini

> Does anybody know why the APIC state loaded by the first call to
> kvm_arch_get_registers() is wrong, in the first place? What exactly is
> different in the APIC state in the second kvm_arch_get_registers() call,
> and when/why does it change?
>
> If cpu_synchronize_state() does the wrong thing if it is called at the
> wrong moment, then we may have other hidden bugs, because the user can
> trigger cpu_synchronize_all_states() calls arbitrarily using monitor
> commands.

My guess is, it's not wrong, it's just outdated when second call 
occures. Maybe it's an ordering issue - could kvmclock state change 
handler be called before other activity is suspended (?)

I didn't pursue it further, cause I don't know too much (anything 
really) about QEMU/APIC internals and how to track its changes.

-- 
mg

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
  2014-10-10  7:33                                                     ` Marcin Gibuła
@ 2014-10-11 12:58                                                       ` Eduardo Habkost
  0 siblings, 0 replies; 76+ messages in thread
From: Eduardo Habkost @ 2014-10-11 12:58 UTC (permalink / raw)
  To: Marcin Gibuła
  Cc: Andrey Korolyov, Fam Zheng, Marcelo Tosatti,
	qemu-devel@nongnu.org, Amit Shah, Paolo Bonzini

On Fri, Oct 10, 2014 at 09:33:04AM +0200, Marcin Gibuła wrote:
> >Does anybody know why the APIC state loaded by the first call to
> >kvm_arch_get_registers() is wrong, in the first place? What exactly is
> >different in the APIC state in the second kvm_arch_get_registers() call,
> >and when/why does it change?
> >
> >If cpu_synchronize_state() does the wrong thing if it is called at the
> >wrong moment, then we may have other hidden bugs, because the user can
> >trigger cpu_synchronize_all_states() calls arbitrarily using monitor
> >commands.
> 
> My guess is, it's not wrong, it's just outdated when second call occures.
> Maybe it's an ordering issue - could kvmclock state change handler be called
> before other activity is suspended (?)

Changing the ordering could just hide the problem. If the APIC state in
the kernel can change outside ioctl(KVM_RUN) calls, then it can't be
cached the same way the other registers loaded by
kvm_arch_get_registers() are. The current code invalidates the QEMU
register data (i.e. sets kvm_vcpu_dirty=false) only on 3 cases: 1) VCPU
init; 2) VCPU reset; 3) on kvm_cpu_exec(), immediately before calling
ioctl(KVM_RUN).

We may need a separate API to load APIC state? I don't know what are the
actual users X86CPU.apic_state, today.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2014-10-11 12:59 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-13 12:28 [Qemu-devel] latest rc: virtio-blk hangs forever after migration Andrey Korolyov
2014-07-13 15:29 ` Andrey Korolyov
2014-07-15 15:57   ` Paolo Bonzini
2014-07-15 17:32     ` Andrey Korolyov
2014-07-15 17:39       ` Andrey Korolyov
2014-07-15  5:03 ` Amit Shah
2014-07-15  6:52   ` Andrey Korolyov
2014-07-15 14:01     ` Andrey Korolyov
2014-07-15 21:09       ` Marcelo Tosatti
2014-07-15 21:25         ` Andrey Korolyov
2014-07-15 22:01           ` Paolo Bonzini
2014-07-15 23:40             ` Andrey Korolyov
2014-07-15 23:47               ` Marcelo Tosatti
2014-07-16  1:16               ` Marcelo Tosatti
2014-07-16  8:38                 ` Andrey Korolyov
2014-07-16 11:52                   ` Marcelo Tosatti
2014-07-16 13:24                     ` Andrey Korolyov
2014-07-16 18:25                       ` Andrey Korolyov
2014-07-16 21:28                         ` Marcin Gibuła
2014-07-16 21:36                           ` Andrey Korolyov
2014-07-17  9:49                             ` Marcin Gibuła
2014-07-17 11:20                               ` Marcin Gibuła
2014-07-17 11:54                               ` Marcin Gibuła
2014-07-17 12:06                                 ` Andrey Korolyov
2014-07-17 13:25                                   ` Marcin Gibuła
2014-07-17 19:18                                     ` Dr. David Alan Gilbert
2014-07-17 20:33                                       ` Marcin Gibuła
2014-07-17 20:50                                     ` Andrey Korolyov
2014-07-18  8:21                                       ` Marcin Gibuła
2014-07-18  8:36                                         ` Andrey Korolyov
2014-07-18  8:44                                           ` Marcin Gibuła
2014-07-18  8:51                                             ` Paolo Bonzini
2014-07-18  8:48                                     ` Paolo Bonzini
2014-07-18  8:57                                       ` Amit Shah
2014-07-18  9:32                                       ` Marcin Gibuła
2014-07-18  9:37                                         ` Paolo Bonzini
2014-07-18  9:48                                           ` Marcin Gibuła
2014-07-29 16:58                                       ` Paolo Bonzini
2014-07-30 12:02                                         ` Marcin Gibuła
2014-07-30 13:38                                           ` Paolo Bonzini
2014-07-30 22:12                                             ` Marcin Gibuła
2014-07-31 11:27                                               ` Marcin Gibuła
2014-08-04 16:30                                                 ` Marcin Gibuła
2014-08-04 18:30                                                   ` Paolo Bonzini
2014-08-08 21:37                                                     ` Marcelo Tosatti
2014-08-09  6:35                                                       ` Paolo Bonzini
2014-08-21 15:48                                                         ` Andrey Korolyov
2014-08-21 16:41                                                           ` Andrey Korolyov
2014-08-21 16:44                                                             ` Paolo Bonzini
2014-08-21 17:51                                                               ` Andrey Korolyov
2014-08-22 16:44                                                                 ` Andrey Korolyov
2014-08-22 17:45                                                                   ` Marcelo Tosatti
2014-08-22 18:39                                                                     ` Andrey Korolyov
2014-08-22 19:05                                                                       ` Marcelo Tosatti
2014-08-22 19:05                                                                         ` Marcelo Tosatti
2014-08-22 19:51                                                                           ` Andrey Korolyov
2014-08-22 21:01                                                                           ` Marcelo Tosatti
2014-08-22 22:21                                                                             ` Andrey Korolyov
2014-08-24 16:19                                                                             ` Andrey Korolyov
2014-08-24 16:35                                                                               ` Paolo Bonzini
2014-08-24 16:57                                                                                 ` Andrey Korolyov
2014-08-24 18:51                                                                                   ` Andrey Korolyov
2014-08-24 20:14                                                                                     ` Andrey Korolyov
2014-08-25 10:45                                                                                       ` Paolo Bonzini
2014-08-25 10:51                                                                                         ` Andrey Korolyov
2014-09-04 16:38                                                                                     ` Marcelo Tosatti
2014-09-04 16:52                                                                                       ` Andrey Korolyov
2014-09-04 18:54                                                                                         ` Marcelo Tosatti
2014-09-04 18:54                                                                                           ` Marcelo Tosatti
2014-09-04 19:13                                                                                             ` Andrey Korolyov
2014-08-22 17:55                                                                   ` Paolo Bonzini
2014-10-09 19:07                                                   ` Eduardo Habkost
2014-10-10  7:33                                                     ` Marcin Gibuła
2014-10-11 12:58                                                       ` Eduardo Habkost
2014-07-16  7:35         ` Marcin Gibuła
2014-07-16 12:00           ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).