All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Ming Lei <ming.lei@canonical.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>,
	Kevin Wolf <kwolf@redhat.com>,
	Dominik Dingel <dingel@linux.vnet.ibm.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] another locking issue in current dataplane code?
Date: Tue, 08 Jul 2014 10:38:39 +0200	[thread overview]
Message-ID: <53BBAE0F.4080607@de.ibm.com> (raw)
In-Reply-To: <CACVXFVNgooea2X65THVo32tehk0sA0PdQGYde1OwbitWj+Zw6A@mail.gmail.com>

On 08/07/14 09:43, Ming Lei wrote:
> On Tue, Jul 8, 2014 at 3:19 PM, Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
>> Ping.
>>
>> has anyone seen a similar hang on x86?
>>
>>
>>
>> On 07/07/14 13:58, Christian Borntraeger wrote:
>>> Folks,
>>>
>>> with current 2.1-rc0 (
>>> +  dataplane: do not free VirtQueueElement in vring_push()
>>> +  virtio-blk: avoid dataplane VirtIOBlockReq early free
>>> + some not-ready yet s390 patches for migration
>>> )
>>>
>>> I still having issues with dataplane during managedsave (without dataplane everything seems to work fine):
>>>
>>> With 1 CPU and 1 disk (and some workload, e.g. a simple dd on the disk) I get:
>>>
>>>
>>> Thread 3 (Thread 0x3fff90fd910 (LWP 27218)):
>>> #0  0x000003fffcdb7ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
>>> #1  0x000003fffcdbac0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
>>> #2  0x000003fffcdb399a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>>> #3  0x00000000801fff06 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8037f788 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
>>> #4  0x00000000800472f4 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:843
>>> #5  qemu_kvm_cpu_thread_fn (arg=0x809ad6b0) at /home/cborntra/REPOS/qemu/cpus.c:879
>>> #6  0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
>>> #7  0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>>>
>>> Thread 2 (Thread 0x3fff88fd910 (LWP 27219)):
>>> #0  0x000003fffba2a8e0 in ppoll () from /lib64/libc.so.6
>>> #1  0x00000000801af250 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
>>> #2  qemu_poll_ns (fds=fds@entry=0x3fff40010c0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:314
>>> #3  0x00000000801b0702 in aio_poll (ctx=0x807f2230, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
>>> #4  0x00000000800be3c4 in iothread_run (opaque=0x807f20d8) at /home/cborntra/REPOS/qemu/iothread.c:41
>>> #5  0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
>>> #6  0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>>>
>>> Thread 1 (Thread 0x3fff9c529b0 (LWP 27215)):
>>> #0  0x000003fffcdb38f0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>>> #1  0x00000000801fff06 in qemu_cond_wait (cond=cond@entry=0x807f22c0, mutex=mutex@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
>>> #2  0x0000000080212906 in rfifolock_lock (r=r@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/rfifolock.c:59
>>> #3  0x000000008019e536 in aio_context_acquire (ctx=ctx@entry=0x807f2230) at /home/cborntra/REPOS/qemu/async.c:295
>>> #4  0x00000000801a34e6 in bdrv_drain_all () at /home/cborntra/REPOS/qemu/block.c:1907
>>> #5  0x0000000080048e24 in do_vm_stop (state=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:538
>>> #6  vm_stop (state=state@entry=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:1221
>>> #7  0x00000000800e6338 in qmp_stop (errp=errp@entry=0x3ffffa9dc00) at /home/cborntra/REPOS/qemu/qmp.c:98
>>> #8  0x00000000800e1314 in qmp_marshal_input_stop (mon=<optimized out>, qdict=<optimized out>, ret=<optimized out>) at qmp-marshal.c:2806
>>> #9  0x000000008004b91a in qmp_call_cmd (cmd=<optimized out>, params=0x8096cf50, mon=0x8080b8a0) at /home/cborntra/REPOS/qemu/monitor.c:5038
>>> #10 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5104
>>> #11 0x00000000801faf16 in json_message_process_token (lexer=0x8080b7c0, token=0x808f2610, type=<optimized out>, x=<optimized out>, y=6) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:87
>>> #12 0x0000000080212bac in json_lexer_feed_char (lexer=lexer@entry=0x8080b7c0, ch=<optimized out>, flush=flush@entry=false) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:303
>>> #13 0x0000000080212cfe in json_lexer_feed (lexer=0x8080b7c0, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:356
>>> #14 0x00000000801fb10e in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:110
>>> #15 0x0000000080049f28 in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5125
>>> #16 0x00000000800c8636 in qemu_chr_be_write (len=1, buf=0x3ffffa9e010 "}[B\377\373\251\372\b", s=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:213
>>> #17 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:2690
>>> #18 0x000003fffcc9f05a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>> #19 0x00000000801ae3e0 in glib_pollfds_poll () at /home/cborntra/REPOS/qemu/main-loop.c:190
>>> #20 os_host_main_loop_wait (timeout=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:235
>>> #21 main_loop_wait (nonblocking=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:484
>>> #22 0x00000000800169e2 in main_loop () at /home/cborntra/REPOS/qemu/vl.c:2024
>>> #23 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:4551
>>>
>>> Now. If aio_poll never returns, we have a deadlock here.
>>> To me it looks like, that aio_poll could be called from iothread_run, even if there are no outstanding request.
>>> Opinions?
> 
> I have sent out one patch to fix the issue, and the title is
> "virtio-blk: data-plane: fix save/set .complete_request in start".
> 
> Please try this patch to see if it fixes your issue.

Yes, I have seen that patch. Unfortunately it does not make a difference for the managedsave case.

  reply	other threads:[~2014-07-08  8:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-07 11:58 [Qemu-devel] another locking issue in current dataplane code? Christian Borntraeger
2014-07-08  7:19 ` Christian Borntraeger
2014-07-08  7:43   ` Ming Lei
2014-07-08  8:38     ` Christian Borntraeger [this message]
2014-07-08  9:09     ` Christian Borntraeger
2014-07-08 10:12       ` Christian Borntraeger
2014-07-08 10:37         ` Christian Borntraeger
2014-07-08 11:03           ` Christian Borntraeger
2014-07-08 15:59 ` Stefan Hajnoczi
2014-07-08 17:08   ` Paolo Bonzini
2014-07-08 19:07     ` Christian Borntraeger
2014-07-08 19:50       ` Paolo Bonzini
2014-07-09  7:56       ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53BBAE0F.4080607@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=dingel@linux.vnet.ibm.com \
    --cc=kwolf@redhat.com \
    --cc=ming.lei@canonical.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.