From: Fiona Ebner <f.ebner@proxmox.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
QEMU Developers <qemu-devel@nongnu.org>
Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>,
John Snow <jsnow@redhat.com>,
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>,
"open list:Network Block Dev..." <qemu-block@nongnu.org>,
Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: deadlock when using iothread during backup_clean()
Date: Thu, 28 Sep 2023 10:06:10 +0200 [thread overview]
Message-ID: <44ff810b-8ec6-0f11-420a-6efa2c7c2475@proxmox.com> (raw)
In-Reply-To: <dd12f39d-a364-b186-2ad7-04343ea85e3f@redhat.com>
Am 05.09.23 um 13:42 schrieb Paolo Bonzini:
> On 9/5/23 12:01, Fiona Ebner wrote:
>> Can we assume block_job_remove_all_bdrv() to always hold the job's
>> AioContext?
>
> I think so, see job_unref_locked(), job_prepare_locked() and
> job_finalize_single_locked(). These call the callbacks that ultimately
> get to block_job_remove_all_bdrv().
>
>> And if yes, can we just tell bdrv_graph_wrlock() that it
>> needs to release that before polling to fix the deadlock?
>
> No, but I think it should be released and re-acquired in
> block_job_remove_all_bdrv() itself.
>
For fixing the backup cancel deadlock, I tried the following:
> diff --git a/blockjob.c b/blockjob.c
> index 58c5d64539..fd6132ebfe 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job)
> * one to make sure that such a concurrent access does not attempt
> * to process an already freed BdrvChild.
> */
> + aio_context_release(job->job.aio_context);
> bdrv_graph_wrlock(NULL);
> + aio_context_acquire(job->job.aio_context);
> while (job->nodes) {
> GSList *l = job->nodes;
> BdrvChild *c = l->data;
but it's not enough unfortunately. And I don't just mean with the later
deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the
other mail.
Even when I got lucky and that deadlock didn't trigger by chance or with
an additional change to try and avoid that one
> diff --git a/block.c b/block.c
> index e7f349b25c..02d2c4e777 100644
> --- a/block.c
> +++ b/block.c
> @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs)
> bs->drv = NULL;
> }
>
> - bdrv_graph_wrlock(NULL);
> + bdrv_graph_wrlock(bs);
> QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
> bdrv_unref_child(bs, child);
> }
often guest IO would get completely stuck after canceling the backup.
There's nothing obvious to me in the backtraces at that point, but it
seems the vCPU and main threads running like usual, while the IO thread
is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This
would happen with both, a VirtIO SCSI and a VirtIO block disk and with
both aio=io_uring and aio=threads.
I should also mention I'm using
> fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k --ioengine=psync --numjobs=5 --runtime=6000 --time_based
inside the guest during canceling of the backup.
I'd be glad for any pointers what to look for and happy to provide more
information.
Best Regards,
Fiona
next prev parent reply other threads:[~2023-09-28 8:07 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-05 10:01 deadlock when using iothread during backup_clean() Fiona Ebner
2023-09-05 11:25 ` Fiona Ebner
2023-10-04 16:51 ` Vladimir Sementsov-Ogievskiy
2023-09-05 11:42 ` Paolo Bonzini
2023-09-28 8:06 ` Fiona Ebner [this message]
2023-10-04 17:08 ` Vladimir Sementsov-Ogievskiy
2023-10-06 12:18 ` Fiona Ebner
2023-10-17 10:18 ` Fiona Ebner
2023-10-17 12:12 ` Kevin Wolf
2023-10-17 13:37 ` Fiona Ebner
2023-10-17 14:20 ` Kevin Wolf
2023-10-18 9:42 ` Fiona Ebner
2023-10-19 12:14 ` Kevin Wolf
2023-10-19 13:53 ` Fiona Ebner
2023-10-20 13:52 ` Fiona Ebner
2023-11-03 13:20 ` Fiona Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44ff810b-8ec6-0f11-420a-6efa2c7c2475@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=hreitz@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=t.lamprecht@proxmox.com \
--cc=vsementsov@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).