From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52139) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4PgV-0004DW-76 for qemu-devel@nongnu.org; Tue, 08 Jul 2014 03:19:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X4PgK-0007jb-Pl for qemu-devel@nongnu.org; Tue, 08 Jul 2014 03:19:27 -0400 Received: from e06smtp14.uk.ibm.com ([195.75.94.110]:36307) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4PgK-0007ix-GF for qemu-devel@nongnu.org; Tue, 08 Jul 2014 03:19:16 -0400 Received: from /spool/local by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 8 Jul 2014 08:19:14 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id 237C817D8042 for ; Tue, 8 Jul 2014 08:20:46 +0100 (BST) Received: from d06av07.portsmouth.uk.ibm.com (d06av07.portsmouth.uk.ibm.com [9.149.37.248]) by b06cxnps3075.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s687JClr34734206 for ; Tue, 8 Jul 2014 07:19:12 GMT Received: from d06av07.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av07.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s687JC00017698 for ; Tue, 8 Jul 2014 03:19:12 -0400 Message-ID: <53BB9B6F.4080201@de.ibm.com> Date: Tue, 08 Jul 2014 09:19:11 +0200 From: Christian Borntraeger MIME-Version: 1.0 References: <53BA8B49.9050709@de.ibm.com> In-Reply-To: <53BA8B49.9050709@de.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] another locking issue in current dataplane code? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Cornelia Huck , Kevin Wolf , ming.lei@canonical.com, "qemu-devel@nongnu.org" , Dominik Dingel Ping. has anyone seen a similar hang on x86? On 07/07/14 13:58, Christian Borntraeger wrote: > Folks, > > with current 2.1-rc0 ( > + dataplane: do not free VirtQueueElement in vring_push() > + virtio-blk: avoid dataplane VirtIOBlockReq early free > + some not-ready yet s390 patches for migration > ) > > I still having issues with dataplane during managedsave (without dataplane everything seems to work fine): > > With 1 CPU and 1 disk (and some workload, e.g. a simple dd on the disk) I get: > > > Thread 3 (Thread 0x3fff90fd910 (LWP 27218)): > #0 0x000003fffcdb7ba0 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x000003fffcdbac0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0 > #2 0x000003fffcdb399a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #3 0x00000000801fff06 in qemu_cond_wait (cond=, mutex=mutex@entry=0x8037f788 ) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135 > #4 0x00000000800472f4 in qemu_kvm_wait_io_event (cpu=) at /home/cborntra/REPOS/qemu/cpus.c:843 > #5 qemu_kvm_cpu_thread_fn (arg=0x809ad6b0) at /home/cborntra/REPOS/qemu/cpus.c:879 > #6 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0 > #7 0x000003fffba350ae in thread_start () from /lib64/libc.so.6 > > Thread 2 (Thread 0x3fff88fd910 (LWP 27219)): > #0 0x000003fffba2a8e0 in ppoll () from /lib64/libc.so.6 > #1 0x00000000801af250 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 > #2 qemu_poll_ns (fds=fds@entry=0x3fff40010c0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:314 > #3 0x00000000801b0702 in aio_poll (ctx=0x807f2230, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221 > #4 0x00000000800be3c4 in iothread_run (opaque=0x807f20d8) at /home/cborntra/REPOS/qemu/iothread.c:41 > #5 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0 > #6 0x000003fffba350ae in thread_start () from /lib64/libc.so.6 > > Thread 1 (Thread 0x3fff9c529b0 (LWP 27215)): > #0 0x000003fffcdb38f0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x00000000801fff06 in qemu_cond_wait (cond=cond@entry=0x807f22c0, mutex=mutex@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135 > #2 0x0000000080212906 in rfifolock_lock (r=r@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/rfifolock.c:59 > #3 0x000000008019e536 in aio_context_acquire (ctx=ctx@entry=0x807f2230) at /home/cborntra/REPOS/qemu/async.c:295 > #4 0x00000000801a34e6 in bdrv_drain_all () at /home/cborntra/REPOS/qemu/block.c:1907 > #5 0x0000000080048e24 in do_vm_stop (state=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:538 > #6 vm_stop (state=state@entry=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:1221 > #7 0x00000000800e6338 in qmp_stop (errp=errp@entry=0x3ffffa9dc00) at /home/cborntra/REPOS/qemu/qmp.c:98 > #8 0x00000000800e1314 in qmp_marshal_input_stop (mon=, qdict=, ret=) at qmp-marshal.c:2806 > #9 0x000000008004b91a in qmp_call_cmd (cmd=, params=0x8096cf50, mon=0x8080b8a0) at /home/cborntra/REPOS/qemu/monitor.c:5038 > #10 handle_qmp_command (parser=, tokens=) at /home/cborntra/REPOS/qemu/monitor.c:5104 > #11 0x00000000801faf16 in json_message_process_token (lexer=0x8080b7c0, token=0x808f2610, type=, x=, y=6) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:87 > #12 0x0000000080212bac in json_lexer_feed_char (lexer=lexer@entry=0x8080b7c0, ch=, flush=flush@entry=false) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:303 > #13 0x0000000080212cfe in json_lexer_feed (lexer=0x8080b7c0, buffer=, size=) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:356 > #14 0x00000000801fb10e in json_message_parser_feed (parser=, buffer=, size=) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:110 > #15 0x0000000080049f28 in monitor_control_read (opaque=, buf=, size=) at /home/cborntra/REPOS/qemu/monitor.c:5125 > #16 0x00000000800c8636 in qemu_chr_be_write (len=1, buf=0x3ffffa9e010 "}[B\377\373\251\372\b", s=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:213 > #17 tcp_chr_read (chan=, cond=, opaque=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:2690 > #18 0x000003fffcc9f05a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 > #19 0x00000000801ae3e0 in glib_pollfds_poll () at /home/cborntra/REPOS/qemu/main-loop.c:190 > #20 os_host_main_loop_wait (timeout=) at /home/cborntra/REPOS/qemu/main-loop.c:235 > #21 main_loop_wait (nonblocking=) at /home/cborntra/REPOS/qemu/main-loop.c:484 > #22 0x00000000800169e2 in main_loop () at /home/cborntra/REPOS/qemu/vl.c:2024 > #23 main (argc=, argv=, envp=) at /home/cborntra/REPOS/qemu/vl.c:4551 > > Now. If aio_poll never returns, we have a deadlock here. > To me it looks like, that aio_poll could be called from iothread_run, even if there are no outstanding request. > Opinions? > > Christian > >