From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15AD5C7618B for ; Thu, 25 Jul 2019 16:28:33 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D5D1F217F4 for ; Thu, 25 Jul 2019 16:28:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D5D1F217F4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:33842 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqgbX-0007oU-UN for qemu-devel@archiver.kernel.org; Thu, 25 Jul 2019 12:28:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36993) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqgad-0004BT-KG for qemu-devel@nongnu.org; Thu, 25 Jul 2019 12:27:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hqgaX-0003xz-Cd for qemu-devel@nongnu.org; Thu, 25 Jul 2019 12:27:33 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57154) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hqgaS-0003tL-Ev; Thu, 25 Jul 2019 12:27:24 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2E96C300CB0E; Thu, 25 Jul 2019 16:27:23 +0000 (UTC) Received: from linux.fritz.box.com (ovpn-117-146.ams2.redhat.com [10.36.117.146]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9A1405D772; Thu, 25 Jul 2019 16:27:21 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 25 Jul 2019 18:27:04 +0200 Message-Id: <20190725162704.12622-5-kwolf@redhat.com> In-Reply-To: <20190725162704.12622-1-kwolf@redhat.com> References: <20190725162704.12622-1-kwolf@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Thu, 25 Jul 2019 16:27:23 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 4/4] block-backend: Queue requests while drained X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, den@virtuozzo.com, qemu-devel@nongnu.org, mreitz@redhat.com, dplotnikov@virtuozzo.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This fixes device like IDE that can still start new requests from I/O handlers in the CPU thread while the block backend is drained. The basic assumption is that in a drain section, no new requests should be allowed through a BlockBackend (blk_drained_begin/end don't exist, we get drain sections only on the node level). However, there are two special cases where requests should not be queued: 1. Block jobs: We already make sure that block jobs are paused in a drain section, so they won't start new requests. However, if the drain_begin is called on the job's BlockBackend first, it can happen that we deadlock because the job stays busy until it reaches a pause point - which it can't if it's requests aren't processed any more. The proper solution here would be to make all requests through the job's filter node instead of using a BlockBackend. For now, just disabling request queuin on the job BlockBackend is simpler. 2. In test cases where making requests through bdrv_* would be cumbersome because we'd need a BdrvChild. As we already got the functionality to disable request queuing from 1., use it in tests, too, for convenience. Signed-off-by: Kevin Wolf --- include/sysemu/block-backend.h | 11 +++--- block/backup.c | 1 + block/block-backend.c | 69 +++++++++++++++++++++++++++++----- block/commit.c | 2 + block/mirror.c | 6 ++- blockjob.c | 3 ++ tests/test-bdrv-drain.c | 1 + 7 files changed, 76 insertions(+), 17 deletions(-) diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backen= d.h index 7320b58467..d453a4e9a1 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -104,6 +104,7 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, = uint64_t *shared_perm); =20 void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow); void blk_set_allow_aio_context_change(BlockBackend *blk, bool allow); +void blk_set_disable_request_queuing(BlockBackend *blk, bool disable); void blk_iostatus_enable(BlockBackend *blk); bool blk_iostatus_is_enabled(const BlockBackend *blk); BlockDeviceIoStatus blk_iostatus(const BlockBackend *blk); @@ -119,10 +120,10 @@ BlockBackend *blk_by_qdev_id(const char *id, Error = **errp); void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *op= aque); int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset, unsigned int bytes, QEMUIOVector *qiov, - BdrvRequestFlags flags); + BdrvRequestFlags flags, bool wait_while_d= rained); int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset, - unsigned int bytes, QEMUIOVector *qiov, - BdrvRequestFlags flags); + unsigned int bytes, QEMUIOVector *qiov, + BdrvRequestFlags flags, bool wait_while_= drained); =20 static inline int coroutine_fn blk_co_pread(BlockBackend *blk, int64_t o= ffset, unsigned int bytes, void *bu= f, @@ -130,7 +131,7 @@ static inline int coroutine_fn blk_co_pread(BlockBack= end *blk, int64_t offset, { QEMUIOVector qiov =3D QEMU_IOVEC_INIT_BUF(qiov, buf, bytes); =20 - return blk_co_preadv(blk, offset, bytes, &qiov, flags); + return blk_co_preadv(blk, offset, bytes, &qiov, flags, true); } =20 static inline int coroutine_fn blk_co_pwrite(BlockBackend *blk, int64_t = offset, @@ -139,7 +140,7 @@ static inline int coroutine_fn blk_co_pwrite(BlockBac= kend *blk, int64_t offset, { QEMUIOVector qiov =3D QEMU_IOVEC_INIT_BUF(qiov, buf, bytes); =20 - return blk_co_pwritev(blk, offset, bytes, &qiov, flags); + return blk_co_pwritev(blk, offset, bytes, &qiov, flags, true); } =20 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset, diff --git a/block/backup.c b/block/backup.c index 715e1d3be8..f66b2f4ee7 100644 --- a/block/backup.c +++ b/block/backup.c @@ -635,6 +635,7 @@ BlockJob *backup_job_create(const char *job_id, Block= DriverState *bs, if (ret < 0) { goto error; } + blk_set_disable_request_queuing(job->target, true); =20 job->on_source_error =3D on_source_error; job->on_target_error =3D on_target_error; diff --git a/block/block-backend.c b/block/block-backend.c index fdd6b01ecf..603b281743 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -79,6 +79,9 @@ struct BlockBackend { QLIST_HEAD(, BlockBackendAioNotifier) aio_notifiers; =20 int quiesce_counter; + CoQueue queued_requests; + bool disable_request_queuing; + VMChangeStateEntry *vmsh; bool force_allow_inactivate; =20 @@ -339,6 +342,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm,= uint64_t shared_perm) =20 block_acct_init(&blk->stats); =20 + qemu_co_queue_init(&blk->queued_requests); notifier_list_init(&blk->remove_bs_notifiers); notifier_list_init(&blk->insert_bs_notifiers); QLIST_INIT(&blk->aio_notifiers); @@ -1096,6 +1100,11 @@ void blk_set_allow_aio_context_change(BlockBackend= *blk, bool allow) blk->allow_aio_context_change =3D allow; } =20 +void blk_set_disable_request_queuing(BlockBackend *blk, bool disable) +{ + blk->disable_request_queuing =3D disable; +} + static int blk_check_byte_request(BlockBackend *blk, int64_t offset, size_t size) { @@ -1127,13 +1136,26 @@ static int blk_check_byte_request(BlockBackend *b= lk, int64_t offset, return 0; } =20 +static void blk_wait_while_drained(BlockBackend *blk) +{ + if (blk->quiesce_counter && !blk->disable_request_queuing) { + qemu_co_queue_wait(&blk->queued_requests, NULL); + } +} + int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset, unsigned int bytes, QEMUIOVector *qiov, - BdrvRequestFlags flags) + BdrvRequestFlags flags, bool wait_while_d= rained) { int ret; - BlockDriverState *bs =3D blk_bs(blk); + BlockDriverState *bs; =20 + if (wait_while_drained) { + blk_wait_while_drained(blk); + } + + /* Call blk_bs() only after waiting, the graph may have changed */ + bs =3D blk_bs(blk); trace_blk_co_preadv(blk, bs, offset, bytes, flags); =20 ret =3D blk_check_byte_request(blk, offset, bytes); @@ -1156,11 +1178,17 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk,= int64_t offset, =20 int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset, unsigned int bytes, QEMUIOVector *qiov, - BdrvRequestFlags flags) + BdrvRequestFlags flags, bool wait_while_= drained) { int ret; - BlockDriverState *bs =3D blk_bs(blk); + BlockDriverState *bs; + + if (wait_while_drained) { + blk_wait_while_drained(blk); + } =20 + /* Call blk_bs() only after waiting, the graph may have changed */ + bs =3D blk_bs(blk); trace_blk_co_pwritev(blk, bs, offset, bytes, flags); =20 ret =3D blk_check_byte_request(blk, offset, bytes); @@ -1198,7 +1226,7 @@ static void blk_read_entry(void *opaque) QEMUIOVector *qiov =3D rwco->iobuf; =20 rwco->ret =3D blk_co_preadv(rwco->blk, rwco->offset, qiov->size, - qiov, rwco->flags); + qiov, rwco->flags, true); aio_wait_kick(); } =20 @@ -1208,7 +1236,7 @@ static void blk_write_entry(void *opaque) QEMUIOVector *qiov =3D rwco->iobuf; =20 rwco->ret =3D blk_co_pwritev(rwco->blk, rwco->offset, qiov->size, - qiov, rwco->flags); + qiov, rwco->flags, true); aio_wait_kick(); } =20 @@ -1349,9 +1377,15 @@ static void blk_aio_read_entry(void *opaque) BlkRwCo *rwco =3D &acb->rwco; QEMUIOVector *qiov =3D rwco->iobuf; =20 + if (rwco->blk->quiesce_counter) { + blk_dec_in_flight(rwco->blk); + blk_wait_while_drained(rwco->blk); + blk_inc_in_flight(rwco->blk); + } + assert(qiov->size =3D=3D acb->bytes); rwco->ret =3D blk_co_preadv(rwco->blk, rwco->offset, acb->bytes, - qiov, rwco->flags); + qiov, rwco->flags, false); blk_aio_complete(acb); } =20 @@ -1361,9 +1395,15 @@ static void blk_aio_write_entry(void *opaque) BlkRwCo *rwco =3D &acb->rwco; QEMUIOVector *qiov =3D rwco->iobuf; =20 + if (rwco->blk->quiesce_counter) { + blk_dec_in_flight(rwco->blk); + blk_wait_while_drained(rwco->blk); + blk_inc_in_flight(rwco->blk); + } + assert(!qiov || qiov->size =3D=3D acb->bytes); rwco->ret =3D blk_co_pwritev(rwco->blk, rwco->offset, acb->bytes, - qiov, rwco->flags); + qiov, rwco->flags, false); blk_aio_complete(acb); } =20 @@ -1482,6 +1522,8 @@ void blk_aio_cancel_async(BlockAIOCB *acb) =20 int blk_co_ioctl(BlockBackend *blk, unsigned long int req, void *buf) { + blk_wait_while_drained(blk); + if (!blk_is_available(blk)) { return -ENOMEDIUM; } @@ -1522,7 +1564,11 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsig= ned long int req, void *buf, =20 int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes) { - int ret =3D blk_check_byte_request(blk, offset, bytes); + int ret; + + blk_wait_while_drained(blk); + + ret =3D blk_check_byte_request(blk, offset, bytes); if (ret < 0) { return ret; } @@ -2004,7 +2050,7 @@ int coroutine_fn blk_co_pwrite_zeroes(BlockBackend = *blk, int64_t offset, int bytes, BdrvRequestFlags flags) { return blk_co_pwritev(blk, offset, bytes, NULL, - flags | BDRV_REQ_ZERO_WRITE); + flags | BDRV_REQ_ZERO_WRITE, true); } =20 int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void = *buf, @@ -2232,6 +2278,9 @@ static void blk_root_drained_end(BdrvChild *child, = int *drained_end_counter) if (blk->dev_ops && blk->dev_ops->drained_end) { blk->dev_ops->drained_end(blk->dev_opaque); } + while (qemu_co_enter_next(&blk->queued_requests, NULL)) { + /* Resume all queued requests */ + } } } =20 diff --git a/block/commit.c b/block/commit.c index 2c5a6d4ebc..408ae15389 100644 --- a/block/commit.c +++ b/block/commit.c @@ -350,6 +350,7 @@ void commit_start(const char *job_id, BlockDriverStat= e *bs, if (ret < 0) { goto fail; } + blk_set_disable_request_queuing(s->base, true); s->base_bs =3D base; =20 /* Required permissions are already taken with block_job_add_bdrv() = */ @@ -358,6 +359,7 @@ void commit_start(const char *job_id, BlockDriverStat= e *bs, if (ret < 0) { goto fail; } + blk_set_disable_request_queuing(s->top, true); =20 s->backing_file_str =3D g_strdup(backing_file_str); s->on_error =3D on_error; diff --git a/block/mirror.c b/block/mirror.c index 7483051f8d..8d0a3a987d 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -231,7 +231,8 @@ static void coroutine_fn mirror_read_complete(MirrorO= p *op, int ret) return; } =20 - ret =3D blk_co_pwritev(s->target, op->offset, op->qiov.size, &op->qi= ov, 0); + ret =3D blk_co_pwritev(s->target, op->offset, op->qiov.size, &op->qi= ov, 0, + false); mirror_write_complete(op, ret); } =20 @@ -1237,7 +1238,7 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMet= hod method, switch (method) { case MIRROR_METHOD_COPY: ret =3D blk_co_pwritev(job->target, dirty_offset, dirty_byte= s, - qiov ? &target_qiov : NULL, flags); + qiov ? &target_qiov : NULL, flags, fals= e); break; =20 case MIRROR_METHOD_ZERO: @@ -1624,6 +1625,7 @@ static BlockJob *mirror_start_job( blk_set_force_allow_inactivate(s->target); } blk_set_allow_aio_context_change(s->target, true); + blk_set_disable_request_queuing(s->target, true); =20 s->replaces =3D g_strdup(replaces); s->on_source_error =3D on_source_error; diff --git a/blockjob.c b/blockjob.c index 20b7f557da..73d9f1ba2b 100644 --- a/blockjob.c +++ b/blockjob.c @@ -445,6 +445,9 @@ void *block_job_create(const char *job_id, const Bloc= kJobDriver *driver, =20 bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker); =20 + /* Disable request queuing in the BlockBackend to avoid deadlocks on= drain: + * The job reports that it's busy until it reaches a pause point. */ + blk_set_disable_request_queuing(blk, true); blk_set_allow_aio_context_change(blk, true); =20 /* Only set speed when necessary to avoid NotSupported error */ diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c index 03fa1142a1..3fcf7c1c95 100644 --- a/tests/test-bdrv-drain.c +++ b/tests/test-bdrv-drain.c @@ -677,6 +677,7 @@ static void test_iothread_common(enum drain_type drai= n_type, int drain_thread) &error_abort); s =3D bs->opaque; blk_insert_bs(blk, bs, &error_abort); + blk_set_disable_request_queuing(blk, true); =20 blk_set_aio_context(blk, ctx_a, &error_abort); aio_context_acquire(ctx_a); --=20 2.20.1