qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Dietmar Maurer <dietmar@proxmox.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, Sergio Lopez <slp@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Max Reitz <mreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"jsnow@redhat.com" <jsnow@redhat.com>
Subject: Re: bdrv_drained_begin deadlock with io-threads
Date: Tue, 31 Mar 2020 18:18:18 +0200 (CEST)	[thread overview]
Message-ID: <518198448.62.1585671498399@webmail.proxmox.com> (raw)
In-Reply-To: <20200331153719.GI7030@linux.fritz.box>


> On March 31, 2020 5:37 PM Kevin Wolf <kwolf@redhat.com> wrote:
> 
>  
> Am 31.03.2020 um 17:24 hat Dietmar Maurer geschrieben:
> > 
> > > > How can I see/debug those waiting request?
> > > 
> > > Examine bs->tracked_requests list.
> > > 
> > > BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine of this request. You may use qemu-gdb script to print request's coroutine back-trace:
> > 
> > I would, but there are no tracked request at all.
> > 
> > print bs->tracked_requests
> > $2 = {lh_first = 0x0}
> 
> Then it's one of the recursively checked parents, as you already figured
> out.
> 
> > Looks bdrv_parent_drained_poll_single() calls
> > blk_root_drained_poll(), which return true in my case (in_flight > 5).
> 
> Can you identify which BlockBackend is this? Specifically if it's the
> one attached to a guest device or whether it belongs to the block job.

This can trigger from various different places, but the simplest case is when
its called from drive_backup_prepare 

>   bdrv_drained_begin(bs);

which is the backup source drive.

Note: I have added printf debug statement to the source, so the line numbers may not exactly match

#0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x0000555555c60479 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x0000555555c60479 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x0000555555c62c61 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffda68, timeout=-1) at util/fdmon-poll.c:79
#4  0x0000555555c62247 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x0000555555bc25d5 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fffe8954bc0)
    at block/io.c:430
#6  0x0000555555bc25d5 in bdrv_do_drained_begin
    (bs=0x7fffe8954bc0, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
#7  0x000055555595f753 in drive_backup_prepare (common=0x7fffe560c200, errp=0x7fffffffdba8) at blockdev.c:1759
#8  0x0000555555961f0a in qmp_transaction (dev_list=dev_list@entry=0x7fffffffdc20, has_props=has_props@entry=false, props=0x7fffe560e0e0, 
    props@entry=0x0, errp=errp@entry=0x7fffffffdc58) at blockdev.c:2406
#9  0x00005555559633c2 in blockdev_do_action (errp=0x7fffffffdc58, action=0x7fffffffdc10) at blockdev.c:1054
#10 0x00005555559633c2 in qmp_drive_backup (backup=backup@entry=0x7fffffffdc60, errp=errp@entry=0x7fffffffdc58) at blockdev.c:3134
#11 0x0000555555b1e1d1 in qmp_marshal_drive_backup (args=<optimized out>, ret=<optimized out>, errp=0x7fffffffdd38) at qapi/qapi-commands-block-core.c:555
#12 0x0000555555c15b08 in qmp_dispatch (cmds=0x555556162000 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:155
#13 0x0000555555affca1 in monitor_qmp_dispatch (mon=0x7fffe8915d80, req=<optimized out>) at monitor/qmp.c:145
#14 0x0000555555b0048a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234
#15 0x0000555555c5e9d5 in aio_bh_call (bh=0x7fffe8e8f960) at util/async.c:164
#16 0x0000555555c5e9d5 in aio_bh_poll (ctx=ctx@entry=0x7fffe8f3ef80) at util/async.c:164
#17 0x0000555555c6208e in aio_dispatch (ctx=0x7fffe8f3ef80) at util/aio-posix.c:380
#18 0x0000555555c5e8be in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
#19 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#20 0x0000555555c612f8 in glib_pollfds_poll () at util/main-loop.c:219
#21 0x0000555555c612f8 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#22 0x0000555555c612f8 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#23 0x00005555558fc5a9 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
#24 0x0000555555800c3e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49

> 
> Maybe have a look at the job coroutine, too. You can probably easiest
> find it in the 'jobs' list, and then print the coroutine backtrace for
> job->co.

There is in drive_backup_prepare(), before the job gets created.

> > Looks like I am loosing poll events somewhere?
> 
> I don't think we've lost any event if in_flight > 0. It means that
> something is still supposedly active. Maybe the job deadlocked.

This is a simple call to bdrv_drained_begin(bs) (before we setup the job).

I really nobody else able to reproduce this (somebody already tried to reproduce)?



  reply	other threads:[~2020-03-31 16:19 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31  8:46 bdrv_drained_begin deadlock with io-threads Dietmar Maurer
2020-03-31  9:17 ` Dietmar Maurer
2020-03-31  9:33   ` Dietmar Maurer
2020-03-31 12:58 ` Kevin Wolf
2020-03-31 14:32   ` Dietmar Maurer
2020-03-31 14:53     ` Vladimir Sementsov-Ogievskiy
2020-03-31 15:24       ` Dietmar Maurer
2020-03-31 15:37         ` Kevin Wolf
2020-03-31 16:18           ` Dietmar Maurer [this message]
2020-04-01 10:37             ` Kevin Wolf
2020-04-01 15:37               ` Dietmar Maurer
2020-04-01 15:50                 ` Dietmar Maurer
2020-04-01 18:12                 ` Kevin Wolf
2020-04-01 18:28                   ` Dietmar Maurer
2020-04-01 18:44                     ` Kevin Wolf
2020-04-02  6:48                       ` Dietmar Maurer
2020-04-02  9:10                       ` Dietmar Maurer
2020-04-02 12:14                         ` Kevin Wolf
2020-04-02 14:25                           ` Kevin Wolf
2020-04-02 15:40                             ` Dietmar Maurer
2020-04-02 16:47                               ` Kevin Wolf
2020-04-02 17:10                                 ` Kevin Wolf
2020-04-03  6:48                                   ` Thomas Lamprecht
2020-04-03  8:26                                   ` Dietmar Maurer
2020-04-03  8:47                                     ` Kevin Wolf
2020-04-03 16:31                                       ` Dietmar Maurer
2020-04-06  8:31                                         ` Kevin Wolf
2020-04-02 15:44                             ` Dietmar Maurer
2020-04-01 18:35                   ` Kevin Wolf
2020-04-02  9:21                   ` Dietmar Maurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=518198448.62.1585671498399@webmail.proxmox.com \
    --to=dietmar@proxmox.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=slp@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).