From: Kevin Wolf <kwolf@redhat.com>
To: Dietmar Maurer <dietmar@proxmox.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-block@nongnu.org, Sergio Lopez <slp@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Max Reitz <mreitz@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
"jsnow@redhat.com" <jsnow@redhat.com>
Subject: Re: bdrv_drained_begin deadlock with io-threads
Date: Wed, 1 Apr 2020 12:37:48 +0200 [thread overview]
Message-ID: <20200401103748.GA4680@linux.fritz.box> (raw)
In-Reply-To: <518198448.62.1585671498399@webmail.proxmox.com>
Am 31.03.2020 um 18:18 hat Dietmar Maurer geschrieben:
> > > Looks bdrv_parent_drained_poll_single() calls
> > > blk_root_drained_poll(), which return true in my case (in_flight > 5).
> >
> > Can you identify which BlockBackend is this? Specifically if it's the
> > one attached to a guest device or whether it belongs to the block job.
>
> This can trigger from various different places, but the simplest case is when
> its called from drive_backup_prepare
>
> > bdrv_drained_begin(bs);
>
> which is the backup source drive.
I mean the BlockBackend for which blk_root_drained_poll() is called.
> > Maybe have a look at the job coroutine, too. You can probably easiest
> > find it in the 'jobs' list, and then print the coroutine backtrace for
> > job->co.
>
> There is in drive_backup_prepare(), before the job gets created.
Oh, I see. Then it can't be job BlockBackend, of course.
> > > Looks like I am loosing poll events somewhere?
> >
> > I don't think we've lost any event if in_flight > 0. It means that
> > something is still supposedly active. Maybe the job deadlocked.
>
> This is a simple call to bdrv_drained_begin(bs) (before we setup the job).
>
> I really nobody else able to reproduce this (somebody already tried to reproduce)?
I can get hangs, but that's for job_completed(), not for starting the
job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
like a different case to me.
In my case, the hanging requests looks like this:
(gdb) qemu coroutine 0x556e055750e0
#0 0x0000556e03999150 in qemu_coroutine_switch (from_=from_@entry=0x556e055750e0, to_=to_@entry=0x7fd34bbeb5b8, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
#1 0x0000556e03997e31 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
#2 0x0000556e0397fc88 in thread_pool_submit_co (pool=0x7fd33c003120, func=func@entry=0x556e038d59a0 <handle_aiocb_rw>, arg=arg@entry=0x7fd2d2b96440) at util/thread-pool.c:289
#3 0x0000556e038d511d in raw_thread_pool_submit (bs=bs@entry=0x556e04e459b0, func=func@entry=0x556e038d59a0 <handle_aiocb_rw>, arg=arg@entry=0x7fd2d2b96440) at block/file-posix.c:1894
#4 0x0000556e038d58c3 in raw_co_prw (bs=0x556e04e459b0, offset=230957056, bytes=4096, qiov=0x7fd33c006fe0, type=1) at block/file-posix.c:1941
Checking the thread pool request:
(gdb) p *((ThreadPool*)0x7fd33c003120).head .lh_first
$9 = {common = {aiocb_info = 0x556e03f43f80 <thread_pool_aiocb_info>, bs = 0x0, cb = 0x556e0397f670 <thread_pool_co_cb>, opaque = 0x7fd2d2b96400, refcnt = 1}, pool = 0x7fd33c003120,
func = 0x556e038d59a0 <handle_aiocb_rw>, arg = 0x7fd2d2b96440, state = THREAD_DONE, ret = 0, reqs = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x0}}, all = {le_next = 0x0,
le_prev = 0x7fd33c0031d0}}
So apparently the request is THREAD_DONE, but the coroutine was never
reentered. I saw one case where ctx.bh_list was empty, but I also have a
case where a BH sits there scheduled and apparently just doesn't get
run:
(gdb) p *((ThreadPool*)0x7fd33c003120).ctx.bh_list .slh_first
$13 = {ctx = 0x556e04e41a10, cb = 0x556e0397f8e0 <thread_pool_completion_bh>, opaque = 0x7fd33c003120, next = {sle_next = 0x0}, flags = 3}
Stefan, I wonder if this is related to the recent changes to the BH
implementation.
Kevin
next prev parent reply other threads:[~2020-04-01 10:38 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-31 8:46 bdrv_drained_begin deadlock with io-threads Dietmar Maurer
2020-03-31 9:17 ` Dietmar Maurer
2020-03-31 9:33 ` Dietmar Maurer
2020-03-31 12:58 ` Kevin Wolf
2020-03-31 14:32 ` Dietmar Maurer
2020-03-31 14:53 ` Vladimir Sementsov-Ogievskiy
2020-03-31 15:24 ` Dietmar Maurer
2020-03-31 15:37 ` Kevin Wolf
2020-03-31 16:18 ` Dietmar Maurer
2020-04-01 10:37 ` Kevin Wolf [this message]
2020-04-01 15:37 ` Dietmar Maurer
2020-04-01 15:50 ` Dietmar Maurer
2020-04-01 18:12 ` Kevin Wolf
2020-04-01 18:28 ` Dietmar Maurer
2020-04-01 18:44 ` Kevin Wolf
2020-04-02 6:48 ` Dietmar Maurer
2020-04-02 9:10 ` Dietmar Maurer
2020-04-02 12:14 ` Kevin Wolf
2020-04-02 14:25 ` Kevin Wolf
2020-04-02 15:40 ` Dietmar Maurer
2020-04-02 16:47 ` Kevin Wolf
2020-04-02 17:10 ` Kevin Wolf
2020-04-03 6:48 ` Thomas Lamprecht
2020-04-03 8:26 ` Dietmar Maurer
2020-04-03 8:47 ` Kevin Wolf
2020-04-03 16:31 ` Dietmar Maurer
2020-04-06 8:31 ` Kevin Wolf
2020-04-02 15:44 ` Dietmar Maurer
2020-04-01 18:35 ` Kevin Wolf
2020-04-02 9:21 ` Dietmar Maurer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200401103748.GA4680@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=dietmar@proxmox.com \
--cc=jsnow@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=slp@redhat.com \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.