From: Kevin Wolf <kwolf@redhat.com>
To: Dietmar Maurer <dietmar@proxmox.com>
Cc: qemu-block@nongnu.org,
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
Max Reitz <mreitz@redhat.com>
Subject: Re: bdrv_drained_begin deadlock with io-threads
Date: Thu, 2 Apr 2020 19:10:07 +0200 [thread overview]
Message-ID: <20200402171007.GF4006@linux.fritz.box> (raw)
In-Reply-To: <20200402164744.GE4006@linux.fritz.box>
Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben:
> Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben:
> > > Can you reproduce the problem with my script, but pointing it to your
> > > Debian image and running stress-ng instead of dd?
> >
> > yes
> >
> > > If so, how long does
> > > it take to reproduce for you?
> >
> > I sometimes need up to 130 iterations ...
> >
> > Worse, I thought several times the bug is gone, but then it triggered again (sigh).
> >
> > But most times below 30 iteration (if you run "stress-ng -d 5").
> >
> > Also note that it can happen at different places, but always inside a drained section ...
>
> I got a stracktrace of a hanging coroutine:
>
> (gdb) qemu coroutine 0x7fd8c00132c0
> #0 0x00005630e16e9840 in qemu_coroutine_switch (from_=from_@entry=0x7fd8c00132c0, to_=to_@entry=0x7fd8cda865b8, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
> #1 0x00005630e16e8521 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
> #2 0x00005630e16e8ba5 in qemu_co_queue_wait_impl (queue=queue@entry=0x5630e48ab580, lock=lock@entry=0x0) at util/qemu-coroutine-lock.c:56
> #3 0x00005630e161f1ae in blk_wait_while_drained (blk=blk@entry=0x5630e48ab260) at /home/kwolf/source/qemu/include/qemu/lockable.h:46
> #4 0x00005630e1620600 in blk_wait_while_drained (blk=0x5630e48ab260) at block/block-backend.c:1189
> #5 0x00005630e1620600 in blk_co_pwritev_part (blk=0x5630e48ab260, offset=2922381312, bytes=856064, qiov=qiov@entry=0x7fd8c002cd70, qiov_offset=qiov_offset@entry=0, flags=0)
> at block/block-backend.c:1189
> #6 0x00005630e16207ce in blk_co_pwritev (flags=<optimized out>, qiov=0x7fd8c002cd70, bytes=<optimized out>, offset=<optimized out>, blk=<optimized out>) at block/block-backend.c:1429
> #7 0x00005630e16207ce in blk_aio_write_entry (opaque=0x7fd8c002cdc0) at block/block-backend.c:1429
> #8 0x00005630e16e98bb in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115
>
> So I think this is the bug: Calling blk_wait_while_drained() from
> anywhere between blk_inc_in_flight() and blk_dec_in_flight() is wrong
> because it will deadlock the drain operation.
>
> blk_aio_read/write_entry() take care of this and drop their reference
> around blk_wait_while_drained(). But if we hit the race condition that
> drain hasn't yet started there, but it has when we get to
> blk_co_preadv() or blk_co_pwritev_part(), then we're in a buggy code
> path.
With the following patch, it seems to survive for now. I'll give it some
more testing tomorrow (also qemu-iotests to check that I didn't
accidentally break something else.)
Kevin
diff --git a/block/block-backend.c b/block/block-backend.c
index 8b8f2a80a0..634acd1541 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1143,7 +1143,9 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
{
if (blk->quiesce_counter && !blk->disable_request_queuing) {
+ blk_dec_in_flight(blk);
qemu_co_queue_wait(&blk->queued_requests, NULL);
+ blk_inc_in_flight(blk);
}
}
@@ -1154,8 +1156,6 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
int ret;
BlockDriverState *bs;
- blk_wait_while_drained(blk);
-
/* Call blk_bs() only after waiting, the graph may have changed */
bs = blk_bs(blk);
trace_blk_co_preadv(blk, bs, offset, bytes, flags);
@@ -1186,8 +1186,6 @@ int coroutine_fn blk_co_pwritev_part(BlockBackend *blk, int64_t offset,
int ret;
BlockDriverState *bs;
- blk_wait_while_drained(blk);
-
/* Call blk_bs() only after waiting, the graph may have changed */
bs = blk_bs(blk);
trace_blk_co_pwritev(blk, bs, offset, bytes, flags);
@@ -1262,6 +1260,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
.ret = NOT_DONE,
};
+ blk_inc_in_flight(blk);
if (qemu_in_coroutine()) {
/* Fast-path if already in coroutine context */
co_entry(&rwco);
@@ -1270,6 +1269,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
bdrv_coroutine_enter(blk_bs(blk), co);
BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
}
+ blk_dec_in_flight(blk);
return rwco.ret;
}
@@ -1388,9 +1388,7 @@ static void blk_aio_read_entry(void *opaque)
QEMUIOVector *qiov = rwco->iobuf;
if (rwco->blk->quiesce_counter) {
- blk_dec_in_flight(rwco->blk);
blk_wait_while_drained(rwco->blk);
- blk_inc_in_flight(rwco->blk);
}
assert(qiov->size == acb->bytes);
@@ -1406,9 +1404,7 @@ static void blk_aio_write_entry(void *opaque)
QEMUIOVector *qiov = rwco->iobuf;
if (rwco->blk->quiesce_counter) {
- blk_dec_in_flight(rwco->blk);
blk_wait_while_drained(rwco->blk);
- blk_inc_in_flight(rwco->blk);
}
assert(!qiov || qiov->size == acb->bytes);
next prev parent reply other threads:[~2020-04-02 17:11 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-31 8:46 bdrv_drained_begin deadlock with io-threads Dietmar Maurer
2020-03-31 9:17 ` Dietmar Maurer
2020-03-31 9:33 ` Dietmar Maurer
2020-03-31 12:58 ` Kevin Wolf
2020-03-31 14:32 ` Dietmar Maurer
2020-03-31 14:53 ` Vladimir Sementsov-Ogievskiy
2020-03-31 15:24 ` Dietmar Maurer
2020-03-31 15:37 ` Kevin Wolf
2020-03-31 16:18 ` Dietmar Maurer
2020-04-01 10:37 ` Kevin Wolf
2020-04-01 15:37 ` Dietmar Maurer
2020-04-01 15:50 ` Dietmar Maurer
2020-04-01 18:12 ` Kevin Wolf
2020-04-01 18:28 ` Dietmar Maurer
2020-04-01 18:44 ` Kevin Wolf
2020-04-02 6:48 ` Dietmar Maurer
2020-04-02 9:10 ` Dietmar Maurer
2020-04-02 12:14 ` Kevin Wolf
2020-04-02 14:25 ` Kevin Wolf
2020-04-02 15:40 ` Dietmar Maurer
2020-04-02 16:47 ` Kevin Wolf
2020-04-02 17:10 ` Kevin Wolf [this message]
2020-04-03 6:48 ` Thomas Lamprecht
2020-04-03 8:26 ` Dietmar Maurer
2020-04-03 8:47 ` Kevin Wolf
2020-04-03 16:31 ` Dietmar Maurer
2020-04-06 8:31 ` Kevin Wolf
2020-04-02 15:44 ` Dietmar Maurer
2020-04-01 18:35 ` Kevin Wolf
2020-04-02 9:21 ` Dietmar Maurer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200402171007.GF4006@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=dietmar@proxmox.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.