From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54468) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agZfa-00074g-4W for qemu-devel@nongnu.org; Thu, 17 Mar 2016 11:17:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1agZfX-0006U3-Vk for qemu-devel@nongnu.org; Thu, 17 Mar 2016 11:17:02 -0400 Received: from e06smtp16.uk.ibm.com ([195.75.94.112]:34198) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agZfX-0006Tg-Az for qemu-devel@nongnu.org; Thu, 17 Mar 2016 11:16:59 -0400 Received: from localhost by e06smtp16.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 17 Mar 2016 15:16:56 -0000 References: <1458123018-18651-1-git-send-email-famz@redhat.com> <56E9355A.5070700@redhat.com> <56E93A22.1080102@de.ibm.com> <56E93ECE.10103@redhat.com> <56E9425C.8030201@de.ibm.com> <56E957AD.2050005@redhat.com> <56E961EA.4090908@de.ibm.com> <56EAA170.1000904@linux.vnet.ibm.com> <56EAA576.8020709@de.ibm.com> <56EAC706.2040006@redhat.com> <56EACA22.2020505@de.ibm.com> From: Christian Borntraeger Message-ID: <56EACA64.9060402@de.ibm.com> Date: Thu, 17 Mar 2016 16:16:52 +0100 MIME-Version: 1.0 In-Reply-To: <56EACA22.2020505@de.ibm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 0/4] Tweaks around virtio-blk start/stop List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , tu bo , Fam Zheng , qemu-devel@nongnu.org Cc: Kevin Wolf , cornelia.huck@de.ibm.com, Stefan Hajnoczi , qemu-block@nongnu.org, "Michael S. Tsirkin" On 03/17/2016 04:15 PM, Christian Borntraeger wrote: > On 03/17/2016 04:02 PM, Paolo Bonzini wrote: >> >> >> On 17/03/2016 13:39, Christian Borntraeger wrote: >>> As an interesting side note, I updated my system from F20 to F23 some days ago >>> (after the initial report). While To Bo is still on a F20 system. I was not able >>> to reproduce the original crash on f23. but going back to F20 made this >>> problem re-appear. >>> >>> Stack trace of thread 26429: >>> #0 0x00000000802008aa tracked_request_begin (qemu-system-s390x) >>> #1 0x0000000080203f3c bdrv_co_do_preadv (qemu-system-s390x) >>> #2 0x000000008020567c bdrv_co_do_readv (qemu-system-s390x) >>> #3 0x000000008025d0f4 coroutine_trampoline (qemu-system-s390x) >>> #4 0x000003ff943d150a __makecontext_ret (libc.so.6) >>> >>> this is with patch 2-4 plus the removal of virtio_queue_host_notifier_read. >>> >>> Without removing virtio_queue_host_notifier_read, I get the same mutex lockup (as expected). >>> >>> Maybe we have two independent issues here and this is some old bug in glibc or >>> whatever? >> >> I'm happy to try and reproduce on x86 if you give me some instruction >> (RHEL7 should be close enough to Fedora 20). >> >> Can you add an assert in virtio_blk_handle_output to catch reentrancy, like > > that was quick (let me know if I should recompile with debugging) > > (gdb) thread apply all bt > > Thread 5 (Thread 0x3ff7b8ff910 (LWP 236419)): > #0 0x000003ff7cdfcf56 in syscall () from /lib64/libc.so.6 > #1 0x000000001022452e in futex_wait (val=, ev=) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:292 > #2 qemu_event_wait (ev=ev@entry=0x1082b5c4 ) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:399 > #3 0x000000001023353a in call_rcu_thread (opaque=) at /home/cborntra/REPOS/qemu/util/rcu.c:250 > #4 0x000003ff7cf084c6 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff7ce02ec2 in thread_start () from /lib64/libc.so.6 > > Thread 4 (Thread 0x3ff78eca910 (LWP 236426)): > #0 0x000003ff7cdf819a in ioctl () from /lib64/libc.so.6 > #1 0x000000001005ddf8 in kvm_vcpu_ioctl (cpu=cpu@entry=0x10c27d40, type=type@entry=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1984 > #2 0x000000001005df1c in kvm_cpu_exec (cpu=cpu@entry=0x10c27d40) at /home/cborntra/REPOS/qemu/kvm-all.c:1834 > #3 0x000000001004b1be in qemu_kvm_cpu_thread_fn (arg=0x10c27d40) at /home/cborntra/REPOS/qemu/cpus.c:1050 > #4 0x000003ff7cf084c6 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff7ce02ec2 in thread_start () from /lib64/libc.so.6 > > Thread 3 (Thread 0x3ff7e8dcbb0 (LWP 236395)): > #0 0x000003ff7cdf66e6 in ppoll () from /lib64/libc.so.6 > #1 0x00000000101a5e08 in ppoll (__ss=0x0, __timeout=0x3ffd6afe8a0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 > #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=1034000000) at /home/cborntra/REPOS/qemu/qemu-timer.c:325 > #3 0x00000000101a56f2 in os_host_main_loop_wait (timeout=1034000000) at /home/cborntra/REPOS/qemu/main-loop.c:251 > #4 main_loop_wait (nonblocking=) at /home/cborntra/REPOS/qemu/main-loop.c:505 > #5 0x00000000100136d6 in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1933 > #6 main (argc=, argv=, envp=) at /home/cborntra/REPOS/qemu/vl.c:4656 > > Thread 2 (Thread 0x3ff7b0ff910 (LWP 236421)): > #0 0x000003ff7cdf66e6 in ppoll () from /lib64/libc.so.6 > #1 0x00000000101a5e28 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 > #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:313 > #3 0x00000000101a727c in aio_poll (ctx=0x10880560, blocking=) at /home/cborntra/REPOS/qemu/aio-posix.c:453 > #4 0x00000000100d39f0 in iothread_run (opaque=0x10880020) at /home/cborntra/REPOS/qemu/iothread.c:46 > #5 0x000003ff7cf084c6 in start_thread () from /lib64/libpthread.so.0 > #6 0x000003ff7ce02ec2 in thread_start () from /lib64/libc.so.6 > > Thread 1 (Thread 0x3ff57fff910 (LWP 236427)): > #0 0x000003ff7cd3b650 in raise () from /lib64/libc.so.6 > #1 0x000003ff7cd3ced8 in abort () from /lib64/libc.so.6 > #2 0x000003ff7cd33666 in __assert_fail_base () from /lib64/libc.so.6 > #3 0x000003ff7cd336f4 in __assert_fail () from /lib64/libc.so.6 > #4 0x000000001007a3c4 in virtio_blk_handle_output (vdev=, vq=) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:595 > #5 0x000000001009390e in virtio_queue_notify_vq (vq=0x10d77c70) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1095 > #6 0x0000000010095894 in virtio_queue_notify_vq (vq=) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1091 > #7 virtio_queue_notify (vdev=, n=n@entry=0) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1101 > #8 0x00000000100a17c8 in virtio_ccw_hcall_notify (args=) at /home/cborntra/REPOS/qemu/hw/s390x/s390-virtio-ccw.c:66 > #9 0x000000001009c210 in s390_virtio_hypercall (env=env@entry=0x10c75aa0) at /home/cborntra/REPOS/qemu/hw/s390x/s390-virtio-hcall.c:35 > #10 0x00000000100cb4e8 in handle_hypercall (run=, cpu=0x10c6d7d0) at /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1283 > #11 handle_diag (ipb=, run=0x3ff78680000, cpu=0x10c6d7d0) at /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1352 FWIW, this looks like that we still have a case, without eventfd during reboot or startup > #12 handle_instruction (run=0x3ff78680000, cpu=0x10c6d7d0) at /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1799 > #13 handle_intercept (cpu=0x10c6d7d0) at /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1842 > #14 kvm_arch_handle_exit (cs=cs@entry=0x10c6d7d0, run=run@entry=0x3ff78680000) at /home/cborntra/REPOS/qemu/target-s390x/kvm.c:2028 > #15 0x000000001005df70 in kvm_cpu_exec (cpu=cpu@entry=0x10c6d7d0) at /home/cborntra/REPOS/qemu/kvm-all.c:1921 > #16 0x000000001004b1be in qemu_kvm_cpu_thread_fn (arg=0x10c6d7d0) at /home/cborntra/REPOS/qemu/cpus.c:1050 > #17 0x000003ff7cf084c6 in start_thread () from /lib64/libpthread.so.0 > #18 0x000003ff7ce02ec2 in thread_start () from /lib64/libc.so.6 >