qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
@ 2015-06-02 14:36 Christian Borntraeger
  2015-06-02 14:51 ` Paolo Bonzini
  2015-06-09  2:28 ` Fam Zheng
  0 siblings, 2 replies; 13+ messages in thread
From: Christian Borntraeger @ 2015-06-02 14:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

Paolo,

I bisected 
commit a0710f7995f914e3044e5899bd8ff6c43c62f916
Author:     Paolo Bonzini <pbonzini@redhat.com>
AuthorDate: Fri Feb 20 17:26:52 2015 +0100
Commit:     Kevin Wolf <kwolf@redhat.com>
CommitDate: Tue Apr 28 15:36:08 2015 +0200

    iothread: release iothread around aio_poll

to cause a problem with hanging guests.

Having many guests all with a kernel/ramdisk (via -kernel) and
several null block devices will result in hangs. All hanging 
guests are in partition detection code waiting for an I/O to return
so very early maybe even the first I/O.

Reverting that commit "fixes" the hangs.
Any ideas?

Christian

PS: A guest xml looks like


<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>test2</name>
  <uuid>13bd8253-9abb-4be8-9399-73b899762aaa</uuid>
  <memory unit='KiB'>286720</memory>
  <currentMemory unit='KiB'>286720</currentMemory>
  <vcpu placement='static'>10</vcpu>
  <iothreads>8</iothreads>
  <cputune>
    <shares>12</shares>
  </cputune>
  <os>
    <type arch='s390x' machine='s390-ccw-virtio'>hvm</type>
    <kernel>/boot/vmlinux-4.0.0+</kernel>
    <initrd>/boot/ramdisk.reboot</initrd>
    <cmdline>root=/dev/ram0</cmdline>
    <boot dev='hd'/>
  </os>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>preserve</on_crash>
  <devices>
    <controller type='usb' index='0' model='none'/>
    <console type='pty'>
      <target type='sclp' port='0'/>
    </console>
    <memballoon model='none'/>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-drive'/>
    <qemu:arg value='driver=null-aio,id=null1,if=none,size=100G'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-blk-ccw,drive=null1,serial=null1,iothread=iothread1'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='driver=null-aio,id=null2,if=none,size=100G'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-blk-ccw,drive=null2,serial=null2,iothread=iothread2'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='driver=null-aio,id=null3,if=none,size=100G'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-blk-ccw,drive=null3,serial=null3,iothread=iothread3'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='driver=null-aio,id=null4,if=none,size=100G'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-blk-ccw,drive=null4,serial=null4,iothread=iothread4'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='driver=null-aio,id=null5,if=none,size=100G'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-blk-ccw,drive=null5,serial=null5,iothread=iothread5'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='driver=null-aio,id=null6,if=none,size=100G'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-blk-ccw,drive=null6,serial=null6,iothread=iothread6'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='driver=null-aio,id=null7,if=none,size=100G'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-blk-ccw,drive=null7,serial=null7,iothread=iothread7'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='driver=null-aio,id=null8,if=none,size=100G'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-blk-ccw,drive=null8,serial=null8,iothread=iothread8'/>
  </qemu:commandline>
</domain>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-06-02 14:36 [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup Christian Borntraeger
@ 2015-06-02 14:51 ` Paolo Bonzini
  2015-06-03  9:17   ` Stefan Hajnoczi
  2015-06-09  2:28 ` Fam Zheng
  1 sibling, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2015-06-02 14:51 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi



On 02/06/2015 16:36, Christian Borntraeger wrote:
> commit a0710f7995f914e3044e5899bd8ff6c43c62f916
> Author:     Paolo Bonzini <pbonzini@redhat.com>
> AuthorDate: Fri Feb 20 17:26:52 2015 +0100
> Commit:     Kevin Wolf <kwolf@redhat.com>
> CommitDate: Tue Apr 28 15:36:08 2015 +0200
> 
>     iothread: release iothread around aio_poll
> 
> to cause a problem with hanging guests.
> 
> Having many guests all with a kernel/ramdisk (via -kernel) and
> several null block devices will result in hangs. All hanging 
> guests are in partition detection code waiting for an I/O to return
> so very early maybe even the first I/O.
> 
> Reverting that commit "fixes" the hangs.
> Any ideas?

Stefan, please revert it as I will not have time to look at it until
well into 2.4 soft freeze.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-06-02 14:51 ` Paolo Bonzini
@ 2015-06-03  9:17   ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2015-06-03  9:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Christian Borntraeger, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 932 bytes --]

On Tue, Jun 02, 2015 at 04:51:46PM +0200, Paolo Bonzini wrote:
> 
> 
> On 02/06/2015 16:36, Christian Borntraeger wrote:
> > commit a0710f7995f914e3044e5899bd8ff6c43c62f916
> > Author:     Paolo Bonzini <pbonzini@redhat.com>
> > AuthorDate: Fri Feb 20 17:26:52 2015 +0100
> > Commit:     Kevin Wolf <kwolf@redhat.com>
> > CommitDate: Tue Apr 28 15:36:08 2015 +0200
> > 
> >     iothread: release iothread around aio_poll
> > 
> > to cause a problem with hanging guests.
> > 
> > Having many guests all with a kernel/ramdisk (via -kernel) and
> > several null block devices will result in hangs. All hanging 
> > guests are in partition detection code waiting for an I/O to return
> > so very early maybe even the first I/O.
> > 
> > Reverting that commit "fixes" the hangs.
> > Any ideas?
> 
> Stefan, please revert it as I will not have time to look at it until
> well into 2.4 soft freeze.

Ok

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-06-02 14:36 [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup Christian Borntraeger
  2015-06-02 14:51 ` Paolo Bonzini
@ 2015-06-09  2:28 ` Fam Zheng
  2015-06-09  9:01   ` Christian Borntraeger
  1 sibling, 1 reply; 13+ messages in thread
From: Fam Zheng @ 2015-06-09  2:28 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Tue, 06/02 16:36, Christian Borntraeger wrote:
> Paolo,
> 
> I bisected 
> commit a0710f7995f914e3044e5899bd8ff6c43c62f916
> Author:     Paolo Bonzini <pbonzini@redhat.com>
> AuthorDate: Fri Feb 20 17:26:52 2015 +0100
> Commit:     Kevin Wolf <kwolf@redhat.com>
> CommitDate: Tue Apr 28 15:36:08 2015 +0200
> 
>     iothread: release iothread around aio_poll
> 
> to cause a problem with hanging guests.
> 
> Having many guests all with a kernel/ramdisk (via -kernel) and
> several null block devices will result in hangs. All hanging 
> guests are in partition detection code waiting for an I/O to return
> so very early maybe even the first I/O.
> 
> Reverting that commit "fixes" the hangs.
> Any ideas?

Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
have a reproducer for x86? Or could you collect backtraces for all the threads
in QEMU when it hangs?

My long shot is that the main loop is blocked at aio_context_acquire(ctx),
while the iothread of that ctx is blocked at aio_poll(ctx, blocking).

Thanks,
Fam

> 
> Christian
> 
> PS: A guest xml looks like
> 
> 
> <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
>   <name>test2</name>
>   <uuid>13bd8253-9abb-4be8-9399-73b899762aaa</uuid>
>   <memory unit='KiB'>286720</memory>
>   <currentMemory unit='KiB'>286720</currentMemory>
>   <vcpu placement='static'>10</vcpu>
>   <iothreads>8</iothreads>
>   <cputune>
>     <shares>12</shares>
>   </cputune>
>   <os>
>     <type arch='s390x' machine='s390-ccw-virtio'>hvm</type>
>     <kernel>/boot/vmlinux-4.0.0+</kernel>
>     <initrd>/boot/ramdisk.reboot</initrd>
>     <cmdline>root=/dev/ram0</cmdline>
>     <boot dev='hd'/>
>   </os>
>   <clock offset='utc'/>
>   <on_poweroff>destroy</on_poweroff>
>   <on_reboot>restart</on_reboot>
>   <on_crash>preserve</on_crash>
>   <devices>
>     <controller type='usb' index='0' model='none'/>
>     <console type='pty'>
>       <target type='sclp' port='0'/>
>     </console>
>     <memballoon model='none'/>
>   </devices>
>   <qemu:commandline>
>     <qemu:arg value='-drive'/>
>     <qemu:arg value='driver=null-aio,id=null1,if=none,size=100G'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='virtio-blk-ccw,drive=null1,serial=null1,iothread=iothread1'/>
>     <qemu:arg value='-drive'/>
>     <qemu:arg value='driver=null-aio,id=null2,if=none,size=100G'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='virtio-blk-ccw,drive=null2,serial=null2,iothread=iothread2'/>
>     <qemu:arg value='-drive'/>
>     <qemu:arg value='driver=null-aio,id=null3,if=none,size=100G'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='virtio-blk-ccw,drive=null3,serial=null3,iothread=iothread3'/>
>     <qemu:arg value='-drive'/>
>     <qemu:arg value='driver=null-aio,id=null4,if=none,size=100G'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='virtio-blk-ccw,drive=null4,serial=null4,iothread=iothread4'/>
>     <qemu:arg value='-drive'/>
>     <qemu:arg value='driver=null-aio,id=null5,if=none,size=100G'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='virtio-blk-ccw,drive=null5,serial=null5,iothread=iothread5'/>
>     <qemu:arg value='-drive'/>
>     <qemu:arg value='driver=null-aio,id=null6,if=none,size=100G'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='virtio-blk-ccw,drive=null6,serial=null6,iothread=iothread6'/>
>     <qemu:arg value='-drive'/>
>     <qemu:arg value='driver=null-aio,id=null7,if=none,size=100G'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='virtio-blk-ccw,drive=null7,serial=null7,iothread=iothread7'/>
>     <qemu:arg value='-drive'/>
>     <qemu:arg value='driver=null-aio,id=null8,if=none,size=100G'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='virtio-blk-ccw,drive=null8,serial=null8,iothread=iothread8'/>
>   </qemu:commandline>
> </domain>
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-06-09  2:28 ` Fam Zheng
@ 2015-06-09  9:01   ` Christian Borntraeger
  2015-06-10  2:12     ` Fam Zheng
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2015-06-09  9:01 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Am 09.06.2015 um 04:28 schrieb Fam Zheng:
> On Tue, 06/02 16:36, Christian Borntraeger wrote:
>> Paolo,
>>
>> I bisected 
>> commit a0710f7995f914e3044e5899bd8ff6c43c62f916
>> Author:     Paolo Bonzini <pbonzini@redhat.com>
>> AuthorDate: Fri Feb 20 17:26:52 2015 +0100
>> Commit:     Kevin Wolf <kwolf@redhat.com>
>> CommitDate: Tue Apr 28 15:36:08 2015 +0200
>>
>>     iothread: release iothread around aio_poll
>>
>> to cause a problem with hanging guests.
>>
>> Having many guests all with a kernel/ramdisk (via -kernel) and
>> several null block devices will result in hangs. All hanging 
>> guests are in partition detection code waiting for an I/O to return
>> so very early maybe even the first I/O.
>>
>> Reverting that commit "fixes" the hangs.
>> Any ideas?
> 
> Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
> have a reproducer for x86? Or could you collect backtraces for all the threads
> in QEMU when it hangs?
> 
> My long shot is that the main loop is blocked at aio_context_acquire(ctx),
> while the iothread of that ctx is blocked at aio_poll(ctx, blocking).

Here is a backtrace on s390. I need 2 or more disks, (one is not enough).

Thread 5 (Thread 0x3fffb406910 (LWP 74602)):
#0  0x000003fffc0bde8e in syscall () from /lib64/libc.so.6
#1  0x00000000801dd282 in futex_wait (val=4294967295, ev=0x8079c6c4 <rcu_call_ready_event>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:301
#2  qemu_event_wait (ev=ev@entry=0x8079c6c4 <rcu_call_ready_event>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:399
#3  0x00000000801ec75c in call_rcu_thread (opaque=<optimized out>) at /home/cborntra/REPOS/qemu/util/rcu.c:233
#4  0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
#5  0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6

Thread 4 (Thread 0x3fffabf7910 (LWP 74604)):
#0  0x000003fffc0b75d6 in ppoll () from /lib64/libc.so.6
#1  0x000000008016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310
#3  0x0000000080170d32 in aio_poll (ctx=0x807d6a70, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274
#4  0x00000000800b3758 in iothread_run (opaque=0x807d6690) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6

Thread 3 (Thread 0x3fffa3f7910 (LWP 74605)):
#0  0x000003fffc0b75d6 in ppoll () from /lib64/libc.so.6
#1  0x000000008016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310
#3  0x0000000080170d32 in aio_poll (ctx=0x807d94a0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274
#4  0x00000000800b3758 in iothread_run (opaque=0x807d6f60) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6

Thread 2 (Thread 0x3fff8a21910 (LWP 74625)):
#0  0x000003fffc0b90a2 in ioctl () from /lib64/libc.so.6
#1  0x0000000080056a46 in kvm_vcpu_ioctl (cpu=cpu@entry=0x81f3b620, type=type@entry=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1916
#2  0x0000000080056b08 in kvm_cpu_exec (cpu=cpu@entry=0x81f3b620) at /home/cborntra/REPOS/qemu/kvm-all.c:1775
#3  0x00000000800445de in qemu_kvm_cpu_thread_fn (arg=0x81f3b620) at /home/cborntra/REPOS/qemu/cpus.c:979
#4  0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
#5  0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x3fffb408bc0 (LWP 74580)):
#0  0x000003fffc0b75d6 in ppoll () from /lib64/libc.so.6
#1  0x000000008016fbb0 in ppoll (__ss=0x0, __timeout=0x3ffffd64438, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=999000000) at /home/cborntra/REPOS/qemu/qemu-timer.c:322
#3  0x000000008016f230 in os_host_main_loop_wait (timeout=999000000) at /home/cborntra/REPOS/qemu/main-loop.c:239
#4  main_loop_wait (nonblocking=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:494
#5  0x000000008001346a in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1789
#6  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:4391

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-06-09  9:01   ` Christian Borntraeger
@ 2015-06-10  2:12     ` Fam Zheng
  2015-06-10  9:18       ` Christian Borntraeger
  0 siblings, 1 reply; 13+ messages in thread
From: Fam Zheng @ 2015-06-10  2:12 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Tue, 06/09 11:01, Christian Borntraeger wrote:
> Am 09.06.2015 um 04:28 schrieb Fam Zheng:
> > On Tue, 06/02 16:36, Christian Borntraeger wrote:
> >> Paolo,
> >>
> >> I bisected 
> >> commit a0710f7995f914e3044e5899bd8ff6c43c62f916
> >> Author:     Paolo Bonzini <pbonzini@redhat.com>
> >> AuthorDate: Fri Feb 20 17:26:52 2015 +0100
> >> Commit:     Kevin Wolf <kwolf@redhat.com>
> >> CommitDate: Tue Apr 28 15:36:08 2015 +0200
> >>
> >>     iothread: release iothread around aio_poll
> >>
> >> to cause a problem with hanging guests.
> >>
> >> Having many guests all with a kernel/ramdisk (via -kernel) and
> >> several null block devices will result in hangs. All hanging 
> >> guests are in partition detection code waiting for an I/O to return
> >> so very early maybe even the first I/O.
> >>
> >> Reverting that commit "fixes" the hangs.
> >> Any ideas?
> > 
> > Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
> > have a reproducer for x86? Or could you collect backtraces for all the threads
> > in QEMU when it hangs?
> > 
> > My long shot is that the main loop is blocked at aio_context_acquire(ctx),
> > while the iothread of that ctx is blocked at aio_poll(ctx, blocking).
> 
> Here is a backtrace on s390. I need 2 or more disks, (one is not enough).

It shows iothreads and main loop are all waiting for events, and the vcpu
threads are running guest code.

It could be the requests being leaked. Do you see this problem with a regular
file based image or null-co driver? Maybe we're missing something about the
AioContext in block/null.c.

Fam

> 
> Thread 5 (Thread 0x3fffb406910 (LWP 74602)):
> #0  0x000003fffc0bde8e in syscall () from /lib64/libc.so.6
> #1  0x00000000801dd282 in futex_wait (val=4294967295, ev=0x8079c6c4 <rcu_call_ready_event>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:301
> #2  qemu_event_wait (ev=ev@entry=0x8079c6c4 <rcu_call_ready_event>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:399
> #3  0x00000000801ec75c in call_rcu_thread (opaque=<optimized out>) at /home/cborntra/REPOS/qemu/util/rcu.c:233
> #4  0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6
> 
> Thread 4 (Thread 0x3fffabf7910 (LWP 74604)):
> #0  0x000003fffc0b75d6 in ppoll () from /lib64/libc.so.6
> #1  0x000000008016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310
> #3  0x0000000080170d32 in aio_poll (ctx=0x807d6a70, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274
> #4  0x00000000800b3758 in iothread_run (opaque=0x807d6690) at /home/cborntra/REPOS/qemu/iothread.c:41
> #5  0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
> #6  0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6
> 
> Thread 3 (Thread 0x3fffa3f7910 (LWP 74605)):
> #0  0x000003fffc0b75d6 in ppoll () from /lib64/libc.so.6
> #1  0x000000008016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310
> #3  0x0000000080170d32 in aio_poll (ctx=0x807d94a0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274
> #4  0x00000000800b3758 in iothread_run (opaque=0x807d6f60) at /home/cborntra/REPOS/qemu/iothread.c:41
> #5  0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
> #6  0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6
> 
> Thread 2 (Thread 0x3fff8a21910 (LWP 74625)):
> #0  0x000003fffc0b90a2 in ioctl () from /lib64/libc.so.6
> #1  0x0000000080056a46 in kvm_vcpu_ioctl (cpu=cpu@entry=0x81f3b620, type=type@entry=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1916
> #2  0x0000000080056b08 in kvm_cpu_exec (cpu=cpu@entry=0x81f3b620) at /home/cborntra/REPOS/qemu/kvm-all.c:1775
> #3  0x00000000800445de in qemu_kvm_cpu_thread_fn (arg=0x81f3b620) at /home/cborntra/REPOS/qemu/cpus.c:979
> #4  0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6
> 
> Thread 1 (Thread 0x3fffb408bc0 (LWP 74580)):
> #0  0x000003fffc0b75d6 in ppoll () from /lib64/libc.so.6
> #1  0x000000008016fbb0 in ppoll (__ss=0x0, __timeout=0x3ffffd64438, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=999000000) at /home/cborntra/REPOS/qemu/qemu-timer.c:322
> #3  0x000000008016f230 in os_host_main_loop_wait (timeout=999000000) at /home/cborntra/REPOS/qemu/main-loop.c:239
> #4  main_loop_wait (nonblocking=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:494
> #5  0x000000008001346a in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1789
> #6  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:4391
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-06-10  2:12     ` Fam Zheng
@ 2015-06-10  9:18       ` Christian Borntraeger
  2015-06-10  9:34         ` Fam Zheng
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2015-06-10  9:18 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Am 10.06.2015 um 04:12 schrieb Fam Zheng:
> On Tue, 06/09 11:01, Christian Borntraeger wrote:
>> Am 09.06.2015 um 04:28 schrieb Fam Zheng:
>>> On Tue, 06/02 16:36, Christian Borntraeger wrote:
>>>> Paolo,
>>>>
>>>> I bisected 
>>>> commit a0710f7995f914e3044e5899bd8ff6c43c62f916
>>>> Author:     Paolo Bonzini <pbonzini@redhat.com>
>>>> AuthorDate: Fri Feb 20 17:26:52 2015 +0100
>>>> Commit:     Kevin Wolf <kwolf@redhat.com>
>>>> CommitDate: Tue Apr 28 15:36:08 2015 +0200
>>>>
>>>>     iothread: release iothread around aio_poll
>>>>
>>>> to cause a problem with hanging guests.
>>>>
>>>> Having many guests all with a kernel/ramdisk (via -kernel) and
>>>> several null block devices will result in hangs. All hanging 
>>>> guests are in partition detection code waiting for an I/O to return
>>>> so very early maybe even the first I/O.
>>>>
>>>> Reverting that commit "fixes" the hangs.
>>>> Any ideas?
>>>
>>> Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
>>> have a reproducer for x86? Or could you collect backtraces for all the threads
>>> in QEMU when it hangs?
>>>
>>> My long shot is that the main loop is blocked at aio_context_acquire(ctx),
>>> while the iothread of that ctx is blocked at aio_poll(ctx, blocking).
>>
>> Here is a backtrace on s390. I need 2 or more disks, (one is not enough).
> 
> It shows iothreads and main loop are all waiting for events, and the vcpu
> threads are running guest code.
> 
> It could be the requests being leaked. Do you see this problem with a regular
> file based image or null-co driver? Maybe we're missing something about the
> AioContext in block/null.c.

It seems to run with normal file based images. As soon as I have two or more null-aio
devices it hangs pretty soon when doing a reboot loop.

Christian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-06-10  9:18       ` Christian Borntraeger
@ 2015-06-10  9:34         ` Fam Zheng
  2015-06-10 10:31           ` Christian Borntraeger
  2015-07-16 11:03           ` Christian Borntraeger
  0 siblings, 2 replies; 13+ messages in thread
From: Fam Zheng @ 2015-06-10  9:34 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Wed, 06/10 11:18, Christian Borntraeger wrote:
> Am 10.06.2015 um 04:12 schrieb Fam Zheng:
> > On Tue, 06/09 11:01, Christian Borntraeger wrote:
> >> Am 09.06.2015 um 04:28 schrieb Fam Zheng:
> >>> On Tue, 06/02 16:36, Christian Borntraeger wrote:
> >>>> Paolo,
> >>>>
> >>>> I bisected 
> >>>> commit a0710f7995f914e3044e5899bd8ff6c43c62f916
> >>>> Author:     Paolo Bonzini <pbonzini@redhat.com>
> >>>> AuthorDate: Fri Feb 20 17:26:52 2015 +0100
> >>>> Commit:     Kevin Wolf <kwolf@redhat.com>
> >>>> CommitDate: Tue Apr 28 15:36:08 2015 +0200
> >>>>
> >>>>     iothread: release iothread around aio_poll
> >>>>
> >>>> to cause a problem with hanging guests.
> >>>>
> >>>> Having many guests all with a kernel/ramdisk (via -kernel) and
> >>>> several null block devices will result in hangs. All hanging 
> >>>> guests are in partition detection code waiting for an I/O to return
> >>>> so very early maybe even the first I/O.
> >>>>
> >>>> Reverting that commit "fixes" the hangs.
> >>>> Any ideas?
> >>>
> >>> Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
> >>> have a reproducer for x86? Or could you collect backtraces for all the threads
> >>> in QEMU when it hangs?
> >>>
> >>> My long shot is that the main loop is blocked at aio_context_acquire(ctx),
> >>> while the iothread of that ctx is blocked at aio_poll(ctx, blocking).
> >>
> >> Here is a backtrace on s390. I need 2 or more disks, (one is not enough).
> > 
> > It shows iothreads and main loop are all waiting for events, and the vcpu
> > threads are running guest code.
> > 
> > It could be the requests being leaked. Do you see this problem with a regular
> > file based image or null-co driver? Maybe we're missing something about the
> > AioContext in block/null.c.
> 
> It seems to run with normal file based images. As soon as I have two or more null-aio
> devices it hangs pretty soon when doing a reboot loop.
> 

Ahh! If it's a reboot loop, the device reset thing may get fishy. I suspect the
completion BH used by null-aio may be messed up, that's why I wonder whether
null-co:// would work for you. Could you test that?

Also, could you try below patch with null-aio://, too?

Thanks,
Fam

---

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cd539aa..c87b444 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -652,15 +652,11 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 {
     VirtIOBlock *s = VIRTIO_BLK(vdev);
 
-    if (s->dataplane) {
-        virtio_blk_data_plane_stop(s->dataplane);
-    }
-
-    /*
-     * This should cancel pending requests, but can't do nicely until there
-     * are per-device request lists.
-     */
     blk_drain_all();
+    if (s->dataplane) {
+        virtio_blk_data_plane_stop(s->dataplane);
+    }
+
     blk_set_enable_write_cache(s->blk, s->original_wce);
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-06-10  9:34         ` Fam Zheng
@ 2015-06-10 10:31           ` Christian Borntraeger
  2015-07-16 11:03           ` Christian Borntraeger
  1 sibling, 0 replies; 13+ messages in thread
From: Christian Borntraeger @ 2015-06-10 10:31 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Am 10.06.2015 um 11:34 schrieb Fam Zheng:
> On Wed, 06/10 11:18, Christian Borntraeger wrote:
>> Am 10.06.2015 um 04:12 schrieb Fam Zheng:
>>> On Tue, 06/09 11:01, Christian Borntraeger wrote:
>>>> Am 09.06.2015 um 04:28 schrieb Fam Zheng:
>>>>> On Tue, 06/02 16:36, Christian Borntraeger wrote:
>>>>>> Paolo,
>>>>>>
>>>>>> I bisected 
>>>>>> commit a0710f7995f914e3044e5899bd8ff6c43c62f916
>>>>>> Author:     Paolo Bonzini <pbonzini@redhat.com>
>>>>>> AuthorDate: Fri Feb 20 17:26:52 2015 +0100
>>>>>> Commit:     Kevin Wolf <kwolf@redhat.com>
>>>>>> CommitDate: Tue Apr 28 15:36:08 2015 +0200
>>>>>>
>>>>>>     iothread: release iothread around aio_poll
>>>>>>
>>>>>> to cause a problem with hanging guests.
>>>>>>
>>>>>> Having many guests all with a kernel/ramdisk (via -kernel) and
>>>>>> several null block devices will result in hangs. All hanging 
>>>>>> guests are in partition detection code waiting for an I/O to return
>>>>>> so very early maybe even the first I/O.
>>>>>>
>>>>>> Reverting that commit "fixes" the hangs.
>>>>>> Any ideas?
>>>>>
>>>>> Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
>>>>> have a reproducer for x86? Or could you collect backtraces for all the threads
>>>>> in QEMU when it hangs?
>>>>>
>>>>> My long shot is that the main loop is blocked at aio_context_acquire(ctx),
>>>>> while the iothread of that ctx is blocked at aio_poll(ctx, blocking).
>>>>
>>>> Here is a backtrace on s390. I need 2 or more disks, (one is not enough).
>>>
>>> It shows iothreads and main loop are all waiting for events, and the vcpu
>>> threads are running guest code.
>>>
>>> It could be the requests being leaked. Do you see this problem with a regular
>>> file based image or null-co driver? Maybe we're missing something about the
>>> AioContext in block/null.c.
>>
>> It seems to run with normal file based images. As soon as I have two or more null-aio
>> devices it hangs pretty soon when doing a reboot loop.
>>
> 
> Ahh! If it's a reboot loop, the device reset thing may get fishy. I suspect the
> completion BH used by null-aio may be messed up, that's why I wonder whether
> null-co:// would work for you. Could you test that?

null-co also fails.

> 
> Also, could you try below patch with null-aio://, too?

The same. Guests still get stuck.


> 
> Thanks,
> Fam
> 
> ---
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index cd539aa..c87b444 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -652,15 +652,11 @@ static void virtio_blk_reset(VirtIODevice *vdev)
>  {
>      VirtIOBlock *s = VIRTIO_BLK(vdev);
> 
> -    if (s->dataplane) {
> -        virtio_blk_data_plane_stop(s->dataplane);
> -    }
> -
> -    /*
> -     * This should cancel pending requests, but can't do nicely until there
> -     * are per-device request lists.
> -     */
>      blk_drain_all();
> +    if (s->dataplane) {
> +        virtio_blk_data_plane_stop(s->dataplane);
> +    }
> +
>      blk_set_enable_write_cache(s->blk, s->original_wce);
>  }
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-06-10  9:34         ` Fam Zheng
  2015-06-10 10:31           ` Christian Borntraeger
@ 2015-07-16 11:03           ` Christian Borntraeger
  2015-07-16 11:20             ` Paolo Bonzini
  1 sibling, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2015-07-16 11:03 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Am 10.06.2015 um 11:34 schrieb Fam Zheng:
> On Wed, 06/10 11:18, Christian Borntraeger wrote:
>> Am 10.06.2015 um 04:12 schrieb Fam Zheng:
>>> On Tue, 06/09 11:01, Christian Borntraeger wrote:
>>>> Am 09.06.2015 um 04:28 schrieb Fam Zheng:
>>>>> On Tue, 06/02 16:36, Christian Borntraeger wrote:
>>>>>> Paolo,
>>>>>>
>>>>>> I bisected 
>>>>>> commit a0710f7995f914e3044e5899bd8ff6c43c62f916
>>>>>> Author:     Paolo Bonzini <pbonzini@redhat.com>
>>>>>> AuthorDate: Fri Feb 20 17:26:52 2015 +0100
>>>>>> Commit:     Kevin Wolf <kwolf@redhat.com>
>>>>>> CommitDate: Tue Apr 28 15:36:08 2015 +0200
>>>>>>
>>>>>>     iothread: release iothread around aio_poll
>>>>>>
>>>>>> to cause a problem with hanging guests.
>>>>>>
>>>>>> Having many guests all with a kernel/ramdisk (via -kernel) and
>>>>>> several null block devices will result in hangs. All hanging 
>>>>>> guests are in partition detection code waiting for an I/O to return
>>>>>> so very early maybe even the first I/O.
>>>>>>
>>>>>> Reverting that commit "fixes" the hangs.
>>>>>> Any ideas?


For what its worth, I can no longer reproduce the issue on
current master + cherry-pick of a0710f7995f (iothread: release iothread around aio_poll)

bisect tells me that

commit 53ec73e264f481b79b52efcadc9ceb8f8996975c
Author:     Fam Zheng <famz@redhat.com>
AuthorDate: Fri May 29 18:53:14 2015 +0800
Commit:     Stefan Hajnoczi <stefanha@redhat.com>
CommitDate: Tue Jul 7 14:27:14 2015 +0100

    block: Use bdrv_drain to replace uncessary bdrv_drain_all

made the problem will blk-null go away. I still dont understand why.

Christian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-07-16 11:03           ` Christian Borntraeger
@ 2015-07-16 11:20             ` Paolo Bonzini
  2015-07-16 11:24               ` Christian Borntraeger
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2015-07-16 11:20 UTC (permalink / raw)
  To: Christian Borntraeger, Fam Zheng; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi



On 16/07/2015 13:03, Christian Borntraeger wrote:
> For what its worth, I can no longer reproduce the issue on
> current master + cherry-pick of a0710f7995f (iothread: release iothread around aio_poll)
> 
> bisect tells me that
> 
> commit 53ec73e264f481b79b52efcadc9ceb8f8996975c
> Author:     Fam Zheng <famz@redhat.com>
> AuthorDate: Fri May 29 18:53:14 2015 +0800
> Commit:     Stefan Hajnoczi <stefanha@redhat.com>
> CommitDate: Tue Jul 7 14:27:14 2015 +0100
> 
>     block: Use bdrv_drain to replace uncessary bdrv_drain_all
> 
> made the problem will blk-null go away. I still dont understand why.

It could be related to the AioContext problem that I'm fixing these
days, too.  Good news, we'll requeue the patch for 2.5.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-07-16 11:20             ` Paolo Bonzini
@ 2015-07-16 11:24               ` Christian Borntraeger
  2015-07-16 11:37                 ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2015-07-16 11:24 UTC (permalink / raw)
  To: Paolo Bonzini, Fam Zheng; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

Am 16.07.2015 um 13:20 schrieb Paolo Bonzini:
> 
> 
> On 16/07/2015 13:03, Christian Borntraeger wrote:
>> For what its worth, I can no longer reproduce the issue on
>> current master + cherry-pick of a0710f7995f (iothread: release iothread around aio_poll)
>>
>> bisect tells me that
>>
>> commit 53ec73e264f481b79b52efcadc9ceb8f8996975c
>> Author:     Fam Zheng <famz@redhat.com>
>> AuthorDate: Fri May 29 18:53:14 2015 +0800
>> Commit:     Stefan Hajnoczi <stefanha@redhat.com>
>> CommitDate: Tue Jul 7 14:27:14 2015 +0100
>>
>>     block: Use bdrv_drain to replace uncessary bdrv_drain_all
>>
>> made the problem will blk-null go away. I still dont understand why.
> 
> It could be related to the AioContext problem that I'm fixing these
> days, too.  Good news, we'll requeue the patch for 2.5.

That was also something that I had in mind (in fact I retested this to check
the ctx patch). master + cherry-pick of a0710f7995f + revert of 53ec73e26 + this fix
still fails, so it was (is?) a different issue. The interesting part is that this
problem required 2 or more disk (and we replace drain_all with single drains) so
it somewhat sounds plausible.

Christian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup
  2015-07-16 11:24               ` Christian Borntraeger
@ 2015-07-16 11:37                 ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2015-07-16 11:37 UTC (permalink / raw)
  To: Christian Borntraeger, Fam Zheng; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi



On 16/07/2015 13:24, Christian Borntraeger wrote:
> The interesting part is that this problem required 2 or more disk
> (and we replace drain_all with single drains) so it somewhat sounds
> plausible.

Yes, indeed.  It is very plausible.  I wanted to reproduce it these
days, so thanks for saving me a lot of time!  I'll test your exact setup
(master + AioContext fix + cherry-pick of a0710f7995f + revert of
53ec73e26).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-07-16 11:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-02 14:36 [Qemu-devel] "iothread: release iothread around aio_poll" causes random hangs at startup Christian Borntraeger
2015-06-02 14:51 ` Paolo Bonzini
2015-06-03  9:17   ` Stefan Hajnoczi
2015-06-09  2:28 ` Fam Zheng
2015-06-09  9:01   ` Christian Borntraeger
2015-06-10  2:12     ` Fam Zheng
2015-06-10  9:18       ` Christian Borntraeger
2015-06-10  9:34         ` Fam Zheng
2015-06-10 10:31           ` Christian Borntraeger
2015-07-16 11:03           ` Christian Borntraeger
2015-07-16 11:20             ` Paolo Bonzini
2015-07-16 11:24               ` Christian Borntraeger
2015-07-16 11:37                 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).