* [Qemu-devel] the whole virtual machine hangs when IO does not come back!
@ 2014-08-11 8:33 Bin Wu
2014-08-11 14:21 ` Stefan Hajnoczi
0 siblings, 1 reply; 10+ messages in thread
From: Bin Wu @ 2014-08-11 8:33 UTC (permalink / raw)
To: qemu-devel, peter.huangpeng
Hi,
I tested the reliability of qemu in the IPSAN environment as follows:
(1) create one VM on a X86 server which is connected to an IPSAN, and
the VM has only one system volume which is on the IPSAN;
(2) disconnect the network between the server and the IPSAN. On the
server, I have a "multipath" software which can hold the IO for a long
time (configurable) when the network is disconnected;
(3) about 30 seconds later, the whole VM hangs there, nothing can be
done to the VM!
Then, I used "gstack" tool to collect the stacks of all qemu threads, it
looked like:
Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)):
#0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6
#1 0x00007fd84410ceff in aio_poll ()
#2 0x00007fd84429bb05 in qemu_aio_wait ()
#3 0x00007fd844120f51 in bdrv_drain_all ()
#4 0x00007fd8441f1a4a in bmdma_cmd_writeb ()
#5 0x00007fd8441f216e in bmdma_write ()
#6 0x00007fd8443a93cf in memory_region_write_accessor ()
#7 0x00007fd8443a94a6 in access_with_adjusted_size ()
#8 0x00007fd8443a9901 in memory_region_iorange_write ()
#9 0x00007fd8443a19bd in ioport_writeb_thunk ()
#10 0x00007fd8443a13a8 in ioport_write ()
#11 0x00007fd8443a1f55 in cpu_outb ()
#12 0x00007fd8443a5b12 in kvm_handle_io ()
#13 0x00007fd8443a64a9 in kvm_cpu_exec ()
#14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()
Thread 7 (Thread 0x7fd8403b4700 (LWP 6672)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 6 (Thread 0x7fd83fbb3700 (LWP 6673)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 5 (Thread 0x7fd83f3b2700 (LWP 6674)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7fd83ebb1700 (LWP 6675)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7fd83e3b0700 (LWP 6676)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7fd23b7ff700 (LWP 6679)):
#0 0x00007fd8427eb61c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x00007fd8444528f0 in qemu_cond_wait ()
#2 0x00007fd844312d9d in vnc_worker_thread_loop ()
#3 0x00007fd844313315 in vnc_worker_thread ()
#4 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#6 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7fd844068840 (LWP 6662)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd84429b991 in os_host_main_loop_wait ()
#6 0x00007fd84429ba50 in main_loop_wait ()
#7 0x00007fd844322793 in main_loop ()
#8 0x00007fd844329a9f in main ()
I think the VM hangs there because the VCPU thread holds the global
qemu metux lock and waits for IO to come back. However, in my test, the
IO doesn't come back (because of the multipath software). Therefore, the
VCPU thread never releases the global lock, and other threads can never
get the lock. Is there any idea to solve the whole vm hanging problem?
I also did the same test on the VMware platform, the IO hangs but the VM
is still working. Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Qemu-devel] the whole virtual machine hangs when IO does not come back!
@ 2014-08-11 10:08 Bin Wu
2014-08-11 11:39 ` Gonglei (Arei)
0 siblings, 1 reply; 10+ messages in thread
From: Bin Wu @ 2014-08-11 10:08 UTC (permalink / raw)
To: qemu-devel; +Cc: rudy.zhangmin, peter.huangpeng
Hi,
I tested the reliability of qemu in the IPSAN environment as follows:
(1) create one VM on a X86 server which is connected to an IPSAN, and
the VM has only one system volume which is on the IPSAN;
(2) disconnect the network between the server and the IPSAN. On the
server, I have a "multipath" software which can hold the IO for a long
time (configurable) when the network is disconnected;
(3) about 30 seconds later, the whole VM hangs there, nothing can be
done to the VM!
Then, I used "gstack" tool to collect the stacks of all qemu threads, it
looked like:
Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)):
#0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6
#1 0x00007fd84410ceff in aio_poll ()
#2 0x00007fd84429bb05 in qemu_aio_wait ()
#3 0x00007fd844120f51 in bdrv_drain_all ()
#4 0x00007fd8441f1a4a in bmdma_cmd_writeb ()
#5 0x00007fd8441f216e in bmdma_write ()
#6 0x00007fd8443a93cf in memory_region_write_accessor ()
#7 0x00007fd8443a94a6 in access_with_adjusted_size ()
#8 0x00007fd8443a9901 in memory_region_iorange_write ()
#9 0x00007fd8443a19bd in ioport_writeb_thunk ()
#10 0x00007fd8443a13a8 in ioport_write ()
#11 0x00007fd8443a1f55 in cpu_outb ()
#12 0x00007fd8443a5b12 in kvm_handle_io ()
#13 0x00007fd8443a64a9 in kvm_cpu_exec ()
#14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()
Thread 7 (Thread 0x7fd8403b4700 (LWP 6672)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 6 (Thread 0x7fd83fbb3700 (LWP 6673)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 5 (Thread 0x7fd83f3b2700 (LWP 6674)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7fd83ebb1700 (LWP 6675)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7fd83e3b0700 (LWP 6676)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd8443a63b9 in kvm_cpu_exec ()
#6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
#7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7fd23b7ff700 (LWP 6679)):
#0 0x00007fd8427eb61c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x00007fd8444528f0 in qemu_cond_wait ()
#2 0x00007fd844312d9d in vnc_worker_thread_loop ()
#3 0x00007fd844313315 in vnc_worker_thread ()
#4 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fd8425439cd in clone () from /lib64/libc.so.6
#6 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7fd844068840 (LWP 6662)):
#0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007fd8427e942e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd8444526bd in qemu_mutex_lock ()
#4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
#5 0x00007fd84429b991 in os_host_main_loop_wait ()
#6 0x00007fd84429ba50 in main_loop_wait ()
#7 0x00007fd844322793 in main_loop ()
#8 0x00007fd844329a9f in main ()
I think the VM hangs there because the VCPU thread holds the global
qemu metux lock and waits for IO to come back. However, in my test, the
IO doesn't come back (because of the multipath software). Therefore, the
VCPU thread never releases the global lock, and other threads can never
get the lock. Is there any idea to solve the whole vm hanging problem?
I also did the same test on the VMware platform, the IO hangs but the VM
is still working. Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] the whole virtual machine hangs when IO does not come back!
2014-08-11 10:08 Bin Wu
@ 2014-08-11 11:39 ` Gonglei (Arei)
2014-08-17 8:12 ` Paolo Bonzini
0 siblings, 1 reply; 10+ messages in thread
From: Gonglei (Arei) @ 2014-08-11 11:39 UTC (permalink / raw)
To: Wubin (H), qemu-devel@nongnu.org
Cc: kwolf@redhat.com, pbonzini@redhat.com, Zhangmin (Rudy),
Huangpeng (Peter), stefanha@redhat.com
Hi,
Cc'ing Kevin, Stefan and Paolo for more attentions.
Best regards,
-Gonglei
> -----Original Message-----
> Subject: [Qemu-devel] the whole virtual machine hangs when IO does not come
> back!
>
> Hi,
>
> I tested the reliability of qemu in the IPSAN environment as follows:
> (1) create one VM on a X86 server which is connected to an IPSAN, and
> the VM has only one system volume which is on the IPSAN;
> (2) disconnect the network between the server and the IPSAN. On the
> server, I have a "multipath" software which can hold the IO for a long
> time (configurable) when the network is disconnected;
> (3) about 30 seconds later, the whole VM hangs there, nothing can be
> done to the VM!
>
> Then, I used "gstack" tool to collect the stacks of all qemu threads, it
> looked like:
>
> Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)):
> #0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6
> #1 0x00007fd84410ceff in aio_poll ()
> #2 0x00007fd84429bb05 in qemu_aio_wait ()
> #3 0x00007fd844120f51 in bdrv_drain_all ()
> #4 0x00007fd8441f1a4a in bmdma_cmd_writeb ()
> #5 0x00007fd8441f216e in bmdma_write ()
> #6 0x00007fd8443a93cf in memory_region_write_accessor ()
> #7 0x00007fd8443a94a6 in access_with_adjusted_size ()
> #8 0x00007fd8443a9901 in memory_region_iorange_write ()
> #9 0x00007fd8443a19bd in ioport_writeb_thunk ()
> #10 0x00007fd8443a13a8 in ioport_write ()
> #11 0x00007fd8443a1f55 in cpu_outb ()
> #12 0x00007fd8443a5b12 in kvm_handle_io ()
> #13 0x00007fd8443a64a9 in kvm_cpu_exec ()
> #14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> #15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> #16 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> #17 0x0000000000000000 in ?? ()
>
> Thread 7 (Thread 0x7fd8403b4700 (LWP 6672)):
> #0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
> #2 0x00007fd8427e942e in pthread_mutex_lock () from
> /lib64/libpthread.so.0
> #3 0x00007fd8444526bd in qemu_mutex_lock ()
> #4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
> #5 0x00007fd8443a63b9 in kvm_cpu_exec ()
> #6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> #7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> #8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> #9 0x0000000000000000 in ?? ()
>
> Thread 6 (Thread 0x7fd83fbb3700 (LWP 6673)):
> #0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
> #2 0x00007fd8427e942e in pthread_mutex_lock () from
> /lib64/libpthread.so.0
> #3 0x00007fd8444526bd in qemu_mutex_lock ()
> #4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
> #5 0x00007fd8443a63b9 in kvm_cpu_exec ()
> #6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> #7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> #8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> #9 0x0000000000000000 in ?? ()
>
> Thread 5 (Thread 0x7fd83f3b2700 (LWP 6674)):
> #0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
> #2 0x00007fd8427e942e in pthread_mutex_lock () from
> /lib64/libpthread.so.0
> #3 0x00007fd8444526bd in qemu_mutex_lock ()
> #4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
> #5 0x00007fd8443a63b9 in kvm_cpu_exec ()
> #6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> #7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> #8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> #9 0x0000000000000000 in ?? ()
>
> Thread 4 (Thread 0x7fd83ebb1700 (LWP 6675)):
> #0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
> #2 0x00007fd8427e942e in pthread_mutex_lock () from
> /lib64/libpthread.so.0
> #3 0x00007fd8444526bd in qemu_mutex_lock ()
> #4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
> #5 0x00007fd8443a63b9 in kvm_cpu_exec ()
> #6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> #7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> #8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> #9 0x0000000000000000 in ?? ()
>
> Thread 3 (Thread 0x7fd83e3b0700 (LWP 6676)):
> #0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
> #2 0x00007fd8427e942e in pthread_mutex_lock () from
> /lib64/libpthread.so.0
> #3 0x00007fd8444526bd in qemu_mutex_lock ()
> #4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
> #5 0x00007fd8443a63b9 in kvm_cpu_exec ()
> #6 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> #7 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> #8 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> #9 0x0000000000000000 in ?? ()
>
> Thread 2 (Thread 0x7fd23b7ff700 (LWP 6679)):
> #0 0x00007fd8427eb61c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1 0x00007fd8444528f0 in qemu_cond_wait ()
> #2 0x00007fd844312d9d in vnc_worker_thread_loop ()
> #3 0x00007fd844313315 in vnc_worker_thread ()
> #4 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> #5 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> #6 0x0000000000000000 in ?? ()
>
> Thread 1 (Thread 0x7fd844068840 (LWP 6662)):
> #0 0x00007fd8427ee294 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007fd8427e9619 in _L_lock_1008 () from /lib64/libpthread.so.0
> #2 0x00007fd8427e942e in pthread_mutex_lock () from
> /lib64/libpthread.so.0
> #3 0x00007fd8444526bd in qemu_mutex_lock ()
> #4 0x00007fd844330f47 in qemu_mutex_lock_iothread ()
> #5 0x00007fd84429b991 in os_host_main_loop_wait ()
> #6 0x00007fd84429ba50 in main_loop_wait ()
> #7 0x00007fd844322793 in main_loop ()
> #8 0x00007fd844329a9f in main ()
>
> I think the VM hangs there because the VCPU thread holds the global
> qemu metux lock and waits for IO to come back. However, in my test, the
> IO doesn't come back (because of the multipath software). Therefore, the
> VCPU thread never releases the global lock, and other threads can never
> get the lock. Is there any idea to solve the whole vm hanging problem?
> I also did the same test on the VMware platform, the IO hangs but the VM
> is still working. Thanks!
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] the whole virtual machine hangs when IO does not come back!
2014-08-11 8:33 [Qemu-devel] the whole virtual machine hangs when IO does not come back! Bin Wu
@ 2014-08-11 14:21 ` Stefan Hajnoczi
2014-08-12 0:58 ` Fam Zheng
2014-08-12 1:10 ` [Qemu-devel] the whole virtual machine hangs when IO does not come back! Bin Wu
0 siblings, 2 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2014-08-11 14:21 UTC (permalink / raw)
To: Bin Wu; +Cc: qemu-devel, peter.huangpeng
[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]
On Mon, Aug 11, 2014 at 04:33:21PM +0800, Bin Wu wrote:
> Hi,
>
> I tested the reliability of qemu in the IPSAN environment as follows:
> (1) create one VM on a X86 server which is connected to an IPSAN, and the VM
> has only one system volume which is on the IPSAN;
> (2) disconnect the network between the server and the IPSAN. On the server,
> I have a "multipath" software which can hold the IO for a long time
> (configurable) when the network is disconnected;
> (3) about 30 seconds later, the whole VM hangs there, nothing can be done to
> the VM!
>
> Then, I used "gstack" tool to collect the stacks of all qemu threads, it
> looked like:
>
> Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)):
> #0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6
> #1 0x00007fd84410ceff in aio_poll ()
> #2 0x00007fd84429bb05 in qemu_aio_wait ()
> #3 0x00007fd844120f51 in bdrv_drain_all ()
> #4 0x00007fd8441f1a4a in bmdma_cmd_writeb ()
> #5 0x00007fd8441f216e in bmdma_write ()
> #6 0x00007fd8443a93cf in memory_region_write_accessor ()
> #7 0x00007fd8443a94a6 in access_with_adjusted_size ()
> #8 0x00007fd8443a9901 in memory_region_iorange_write ()
> #9 0x00007fd8443a19bd in ioport_writeb_thunk ()
> #10 0x00007fd8443a13a8 in ioport_write ()
> #11 0x00007fd8443a1f55 in cpu_outb ()
> #12 0x00007fd8443a5b12 in kvm_handle_io ()
> #13 0x00007fd8443a64a9 in kvm_cpu_exec ()
> #14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> #15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> #16 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> #17 0x0000000000000000 in ?? ()
Use virtio-blk. Read, write, and flush are asynchronous in virtio-blk.
Note that the QEMU monitor commands are typically synchronous so they
will still block the VM.
Stefan
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] the whole virtual machine hangs when IO does not come back!
2014-08-11 14:21 ` Stefan Hajnoczi
@ 2014-08-12 0:58 ` Fam Zheng
2014-08-12 2:09 ` [Qemu-devel] the whole virtual machine hangs when IO does notcome back! Zhang Haoyu
2014-08-12 1:10 ` [Qemu-devel] the whole virtual machine hangs when IO does not come back! Bin Wu
1 sibling, 1 reply; 10+ messages in thread
From: Fam Zheng @ 2014-08-12 0:58 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, Bin Wu, peter.huangpeng
On Mon, 08/11 15:21, Stefan Hajnoczi wrote:
> On Mon, Aug 11, 2014 at 04:33:21PM +0800, Bin Wu wrote:
> > Hi,
> >
> > I tested the reliability of qemu in the IPSAN environment as follows:
> > (1) create one VM on a X86 server which is connected to an IPSAN, and the VM
> > has only one system volume which is on the IPSAN;
> > (2) disconnect the network between the server and the IPSAN. On the server,
> > I have a "multipath" software which can hold the IO for a long time
> > (configurable) when the network is disconnected;
> > (3) about 30 seconds later, the whole VM hangs there, nothing can be done to
> > the VM!
> >
> > Then, I used "gstack" tool to collect the stacks of all qemu threads, it
> > looked like:
> >
> > Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)):
> > #0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6
> > #1 0x00007fd84410ceff in aio_poll ()
> > #2 0x00007fd84429bb05 in qemu_aio_wait ()
> > #3 0x00007fd844120f51 in bdrv_drain_all ()
> > #4 0x00007fd8441f1a4a in bmdma_cmd_writeb ()
> > #5 0x00007fd8441f216e in bmdma_write ()
> > #6 0x00007fd8443a93cf in memory_region_write_accessor ()
> > #7 0x00007fd8443a94a6 in access_with_adjusted_size ()
> > #8 0x00007fd8443a9901 in memory_region_iorange_write ()
> > #9 0x00007fd8443a19bd in ioport_writeb_thunk ()
> > #10 0x00007fd8443a13a8 in ioport_write ()
> > #11 0x00007fd8443a1f55 in cpu_outb ()
> > #12 0x00007fd8443a5b12 in kvm_handle_io ()
> > #13 0x00007fd8443a64a9 in kvm_cpu_exec ()
> > #14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> > #15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> > #16 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> > #17 0x0000000000000000 in ?? ()
>
> Use virtio-blk. Read, write, and flush are asynchronous in virtio-blk.
>
> Note that the QEMU monitor commands are typically synchronous so they
> will still block the VM.
>
If some of the requests are dropped by host and never return to QEMU, I think
bdrv_drain_all() will still cause the hang. Even with virtio-blk, reset has
such a call. Maybe we could add some -ETIMEDOUT machanism in QEMU's block
layer.
A workaround might be to configure the host storage to fail the IO after a
timeout.
Fam
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] the whole virtual machine hangs when IO does not come back!
2014-08-11 14:21 ` Stefan Hajnoczi
2014-08-12 0:58 ` Fam Zheng
@ 2014-08-12 1:10 ` Bin Wu
2014-09-08 8:35 ` Stefan Hajnoczi
1 sibling, 1 reply; 10+ messages in thread
From: Bin Wu @ 2014-08-12 1:10 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, peter.huangpeng
On 2014/8/11 22:21, Stefan Hajnoczi wrote:
> On Mon, Aug 11, 2014 at 04:33:21PM +0800, Bin Wu wrote:
>> Hi,
>>
>> I tested the reliability of qemu in the IPSAN environment as follows:
>> (1) create one VM on a X86 server which is connected to an IPSAN, and the VM
>> has only one system volume which is on the IPSAN;
>> (2) disconnect the network between the server and the IPSAN. On the server,
>> I have a "multipath" software which can hold the IO for a long time
>> (configurable) when the network is disconnected;
>> (3) about 30 seconds later, the whole VM hangs there, nothing can be done to
>> the VM!
>>
>> Then, I used "gstack" tool to collect the stacks of all qemu threads, it
>> looked like:
>>
>> Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)):
>> #0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6
>> #1 0x00007fd84410ceff in aio_poll ()
>> #2 0x00007fd84429bb05 in qemu_aio_wait ()
>> #3 0x00007fd844120f51 in bdrv_drain_all ()
>> #4 0x00007fd8441f1a4a in bmdma_cmd_writeb ()
>> #5 0x00007fd8441f216e in bmdma_write ()
>> #6 0x00007fd8443a93cf in memory_region_write_accessor ()
>> #7 0x00007fd8443a94a6 in access_with_adjusted_size ()
>> #8 0x00007fd8443a9901 in memory_region_iorange_write ()
>> #9 0x00007fd8443a19bd in ioport_writeb_thunk ()
>> #10 0x00007fd8443a13a8 in ioport_write ()
>> #11 0x00007fd8443a1f55 in cpu_outb ()
>> #12 0x00007fd8443a5b12 in kvm_handle_io ()
>> #13 0x00007fd8443a64a9 in kvm_cpu_exec ()
>> #14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
>> #15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
>> #16 0x00007fd8425439cd in clone () from /lib64/libc.so.6
>> #17 0x0000000000000000 in ?? ()
> Use virtio-blk. Read, write, and flush are asynchronous in virtio-blk.
>
> Note that the QEMU monitor commands are typically synchronous so they
> will still block the VM.
>
> Stefan
Thank you for your attention. I tested virtio-blk and it's true that the
VM doesn't hange.
Why does the virtio-blk implement this in asynchronous way, but
virtio-scsi in synchronous
way?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] the whole virtual machine hangs when IO does notcome back!
2014-08-12 0:58 ` Fam Zheng
@ 2014-08-12 2:09 ` Zhang Haoyu
2014-08-12 2:27 ` Fam Zheng
0 siblings, 1 reply; 10+ messages in thread
From: Zhang Haoyu @ 2014-08-12 2:09 UTC (permalink / raw)
To: Fam Zheng, Stefan Hajnoczi; +Cc: qemu-devel, Bin Wu
>> > Hi,
>> >
>> > I tested the reliability of qemu in the IPSAN environment as follows:
>> > (1) create one VM on a X86 server which is connected to an IPSAN, and the VM
>> > has only one system volume which is on the IPSAN;
>> > (2) disconnect the network between the server and the IPSAN. On the server,
>> > I have a "multipath" software which can hold the IO for a long time
>> > (configurable) when the network is disconnected;
>> > (3) about 30 seconds later, the whole VM hangs there, nothing can be done to
>> > the VM!
>> >
>> > Then, I used "gstack" tool to collect the stacks of all qemu threads, it
>> > looked like:
>> >
>> > Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)):
>> > #0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6
>> > #1 0x00007fd84410ceff in aio_poll ()
>> > #2 0x00007fd84429bb05 in qemu_aio_wait ()
>> > #3 0x00007fd844120f51 in bdrv_drain_all ()
>> > #4 0x00007fd8441f1a4a in bmdma_cmd_writeb ()
>> > #5 0x00007fd8441f216e in bmdma_write ()
>> > #6 0x00007fd8443a93cf in memory_region_write_accessor ()
>> > #7 0x00007fd8443a94a6 in access_with_adjusted_size ()
>> > #8 0x00007fd8443a9901 in memory_region_iorange_write ()
>> > #9 0x00007fd8443a19bd in ioport_writeb_thunk ()
>> > #10 0x00007fd8443a13a8 in ioport_write ()
>> > #11 0x00007fd8443a1f55 in cpu_outb ()
>> > #12 0x00007fd8443a5b12 in kvm_handle_io ()
>> > #13 0x00007fd8443a64a9 in kvm_cpu_exec ()
>> > #14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
>> > #15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
>> > #16 0x00007fd8425439cd in clone () from /lib64/libc.so.6
>> > #17 0x0000000000000000 in ?? ()
>>
>> Use virtio-blk. Read, write, and flush are asynchronous in virtio-blk.
>>
>> Note that the QEMU monitor commands are typically synchronous so they
>> will still block the VM.
>>
>
>If some of the requests are dropped by host and never return to QEMU, I think
>bdrv_drain_all() will still cause the hang. Even with virtio-blk, reset has
>such a call. Maybe we could add some -ETIMEDOUT machanism in QEMU's block
>layer.
>
>A workaround might be to configure the host storage to fail the IO after a
>timeout.
>
If -ETIMEOUT returned after a short time network disconnection, may unpredicted fault happened in VM ?
e.g., the VM was reading important data(like, system data).
Does aio replay work for this case?
Thanks,
Zhang Haoyu
>Fam
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] the whole virtual machine hangs when IO does notcome back!
2014-08-12 2:09 ` [Qemu-devel] the whole virtual machine hangs when IO does notcome back! Zhang Haoyu
@ 2014-08-12 2:27 ` Fam Zheng
0 siblings, 0 replies; 10+ messages in thread
From: Fam Zheng @ 2014-08-12 2:27 UTC (permalink / raw)
To: Zhang Haoyu; +Cc: Stefan Hajnoczi, qemu-devel, Bin Wu
On Tue, 08/12 10:09, Zhang Haoyu wrote:
> >> > Hi,
> >> >
> >> > I tested the reliability of qemu in the IPSAN environment as follows:
> >> > (1) create one VM on a X86 server which is connected to an IPSAN, and the VM
> >> > has only one system volume which is on the IPSAN;
> >> > (2) disconnect the network between the server and the IPSAN. On the server,
> >> > I have a "multipath" software which can hold the IO for a long time
> >> > (configurable) when the network is disconnected;
> >> > (3) about 30 seconds later, the whole VM hangs there, nothing can be done to
> >> > the VM!
> >> >
> >> > Then, I used "gstack" tool to collect the stacks of all qemu threads, it
> >> > looked like:
> >> >
> >> > Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)):
> >> > #0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6
> >> > #1 0x00007fd84410ceff in aio_poll ()
> >> > #2 0x00007fd84429bb05 in qemu_aio_wait ()
> >> > #3 0x00007fd844120f51 in bdrv_drain_all ()
> >> > #4 0x00007fd8441f1a4a in bmdma_cmd_writeb ()
> >> > #5 0x00007fd8441f216e in bmdma_write ()
> >> > #6 0x00007fd8443a93cf in memory_region_write_accessor ()
> >> > #7 0x00007fd8443a94a6 in access_with_adjusted_size ()
> >> > #8 0x00007fd8443a9901 in memory_region_iorange_write ()
> >> > #9 0x00007fd8443a19bd in ioport_writeb_thunk ()
> >> > #10 0x00007fd8443a13a8 in ioport_write ()
> >> > #11 0x00007fd8443a1f55 in cpu_outb ()
> >> > #12 0x00007fd8443a5b12 in kvm_handle_io ()
> >> > #13 0x00007fd8443a64a9 in kvm_cpu_exec ()
> >> > #14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> >> > #15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> >> > #16 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> >> > #17 0x0000000000000000 in ?? ()
> >>
> >> Use virtio-blk. Read, write, and flush are asynchronous in virtio-blk.
> >>
> >> Note that the QEMU monitor commands are typically synchronous so they
> >> will still block the VM.
> >>
> >
> >If some of the requests are dropped by host and never return to QEMU, I think
> >bdrv_drain_all() will still cause the hang. Even with virtio-blk, reset has
> >such a call. Maybe we could add some -ETIMEDOUT machanism in QEMU's block
> >layer.
> >
> >A workaround might be to configure the host storage to fail the IO after a
> >timeout.
> >
> If -ETIMEOUT returned after a short time network disconnection, may unpredicted fault happened in VM ?
> e.g., the VM was reading important data(like, system data).
> Does aio replay work for this case?
Guest should do error handling with it, in a way similar to -EIO. The
connection is still down even if guest is free to retry, isn't it?
Fam
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] the whole virtual machine hangs when IO does not come back!
2014-08-11 11:39 ` Gonglei (Arei)
@ 2014-08-17 8:12 ` Paolo Bonzini
0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2014-08-17 8:12 UTC (permalink / raw)
To: Gonglei (Arei), Wubin (H), qemu-devel@nongnu.org
Cc: kwolf@redhat.com, Zhangmin (Rudy), Huangpeng (Peter),
stefanha@redhat.com
Il 11/08/2014 13:39, Gonglei (Arei) ha scritto:
>> I think the VM hangs there because the VCPU thread holds the global
>> qemu metux lock and waits for IO to come back. However, in my test, the
>> IO doesn't come back (because of the multipath software). Therefore, the
>> VCPU thread never releases the global lock, and other threads can never
>> get the lock. Is there any idea to solve the whole vm hanging problem?
The problem is that qemu_aio_cancel is a synchronous function.
You can work around it with an eh_times_out function in the virtio-scsi
driver, but the real fix would be in QEMU.
Paolo
>> I also did the same test on the VMware platform, the IO hangs but the VM
>> is still working. Thanks!
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] the whole virtual machine hangs when IO does not come back!
2014-08-12 1:10 ` [Qemu-devel] the whole virtual machine hangs when IO does not come back! Bin Wu
@ 2014-09-08 8:35 ` Stefan Hajnoczi
0 siblings, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2014-09-08 8:35 UTC (permalink / raw)
To: Bin Wu; +Cc: qemu-devel, peter.huangpeng
[-- Attachment #1: Type: text/plain, Size: 2501 bytes --]
On Tue, Aug 12, 2014 at 09:10:26AM +0800, Bin Wu wrote:
> On 2014/8/11 22:21, Stefan Hajnoczi wrote:
> >On Mon, Aug 11, 2014 at 04:33:21PM +0800, Bin Wu wrote:
> >>Hi,
> >>
> >>I tested the reliability of qemu in the IPSAN environment as follows:
> >>(1) create one VM on a X86 server which is connected to an IPSAN, and the VM
> >>has only one system volume which is on the IPSAN;
> >>(2) disconnect the network between the server and the IPSAN. On the server,
> >>I have a "multipath" software which can hold the IO for a long time
> >>(configurable) when the network is disconnected;
> >>(3) about 30 seconds later, the whole VM hangs there, nothing can be done to
> >>the VM!
> >>
> >>Then, I used "gstack" tool to collect the stacks of all qemu threads, it
> >>looked like:
> >>
> >>Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)):
> >>#0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6
> >>#1 0x00007fd84410ceff in aio_poll ()
> >>#2 0x00007fd84429bb05 in qemu_aio_wait ()
> >>#3 0x00007fd844120f51 in bdrv_drain_all ()
> >>#4 0x00007fd8441f1a4a in bmdma_cmd_writeb ()
> >>#5 0x00007fd8441f216e in bmdma_write ()
> >>#6 0x00007fd8443a93cf in memory_region_write_accessor ()
> >>#7 0x00007fd8443a94a6 in access_with_adjusted_size ()
> >>#8 0x00007fd8443a9901 in memory_region_iorange_write ()
> >>#9 0x00007fd8443a19bd in ioport_writeb_thunk ()
> >>#10 0x00007fd8443a13a8 in ioport_write ()
> >>#11 0x00007fd8443a1f55 in cpu_outb ()
> >>#12 0x00007fd8443a5b12 in kvm_handle_io ()
> >>#13 0x00007fd8443a64a9 in kvm_cpu_exec ()
> >>#14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn ()
> >>#15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0
> >>#16 0x00007fd8425439cd in clone () from /lib64/libc.so.6
> >>#17 0x0000000000000000 in ?? ()
> >Use virtio-blk. Read, write, and flush are asynchronous in virtio-blk.
> >
> >Note that the QEMU monitor commands are typically synchronous so they
> >will still block the VM.
> >
> >Stefan
> Thank you for your attention. I tested virtio-blk and it's true that the VM
> doesn't hange.
> Why does the virtio-blk implement this in asynchronous way, but virtio-scsi
> in synchronous
> way?
There is no fundamental reason why virtio-scsi should be synchronous,
it's just that QEMU internally has some points (such as bdrv_drain_all()
that Fam mentioned) that wait synchronously.
Since the SCSI cancel code path hit bdrv_drain_all(), the guest hung.
virtio-blk doesn't have a "cancel" operation and therefore doesn't hang.
Stefan
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-09-08 8:35 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-11 8:33 [Qemu-devel] the whole virtual machine hangs when IO does not come back! Bin Wu
2014-08-11 14:21 ` Stefan Hajnoczi
2014-08-12 0:58 ` Fam Zheng
2014-08-12 2:09 ` [Qemu-devel] the whole virtual machine hangs when IO does notcome back! Zhang Haoyu
2014-08-12 2:27 ` Fam Zheng
2014-08-12 1:10 ` [Qemu-devel] the whole virtual machine hangs when IO does not come back! Bin Wu
2014-09-08 8:35 ` Stefan Hajnoczi
-- strict thread matches above, loose matches on Subject: below --
2014-08-11 10:08 Bin Wu
2014-08-11 11:39 ` Gonglei (Arei)
2014-08-17 8:12 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).