Deadlock with SATA CD I/O and eject

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: John Levon <levon@movementarian.org>
To: Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org
Subject: Deadlock with SATA CD I/O and eject
Date: Mon, 18 Sep 2023 18:28:36 +0100	[thread overview]
Message-ID: <ZQiIxERjYmZb0v4l@movementarian.org> (raw)

Observed with base of qemu 6.2.0, but from code inspection it looks to me like
it's still current in upstream master. Apologies if I have missed a fix in this
area.

Symptom: run a UEFI-booted SATA CD Windows installer. When it hits "Loading
files.." screen, run an eject e.g.

virsh qemu-monitor-command 64c6e190-ea7f-49e2-b2d5-6ba1814b00ae '{"execute":"eject", "arguments": { "id": "sata0-0-0" } }'

qemu will get stuck like so:

gdb) bt
#0  0x00007f8ba4b16036 in ppoll () from /lib64/libc.so.6
#1  0x0000561813c48ed5 in ppoll (__ss=0x0, __timeout=0x7ffcbd981a70, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:62
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=999896128) at ../util/qemu-timer.c:348
#3  0x0000561813c29be9 in fdmon_poll_wait (ctx=0x56181516e070, ready_list=0x7ffcbd981af0, timeout=999896128) at ../util/fdmon-poll.c:80
#4  0x0000561813c297e1 in aio_poll (ctx=ctx@entry=0x56181516e070, blocking=blocking@entry=true) at ../util/aio-posix.c:607
#5  0x0000561813ae2fad in bdrv_do_drained_begin (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x56181533fcc0) at ../block/io.c:483
#6  bdrv_do_drained_begin (bs=0x56181533fcc0, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at ../block/io.c:446
#7  0x0000561813ad9982 in blk_drain (blk=0x5618161c1f10) at ../block/block-backend.c:1741
#8  0x0000561813ad9b8c in blk_remove_bs (blk=blk@entry=0x5618161c1f10) at ../block/block-backend.c:852
#9  0x000056181382b8ab in blockdev_remove_medium (has_device=<optimized out>, device=<optimized out>, has_id=<optimized out>, id=<optimized out>, errp=0x7ffcbd981c78) at ../block/qapi-sysemu.c:232
#10 0x000056181382bfb1 in qmp_eject (has_device=<optimized out>, device=0x0, has_id=<optimized out>, id=0x561815e6efe0 "sata0-0-0", has_force=<optimized out>, force=<optimized out>, errp=0x7ffcbd981c78) at ../block/qapi-sysemu.c:45

We are stuck forever here:

 351 static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
 352                                   bool poll)
...
 380     if (poll) {
 381         BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
 382     }

Because the blk root's ->in_flight is 1, as tested by the condition
blk_root_drained_poll().

Our blk->in_flight user is stuck here:

1298 static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
...
1310         blk_dec_in_flight(blk);
1311         qemu_co_queue_wait(&blk->queued_requests, &blk->queued_requests_lock);
1312         blk_inc_in_flight(blk);

Note that before entering this stanza, blk->in_flight was 2. This turns out to
be due to the ide atapi code. In particular, the guest VM is generating lots of
read I/O. The "first IO" arrives into blk via:

cd_read_sector()->ide_buffered_readv()->blk_aio_preadv()

This initial IO completes:

blk_aio_read_entry()->blk_aio_complete()

1560 static void blk_aio_complete(BlkAioEmAIOCB *acb)
1561 {
1562     if (acb->has_returned) {
1563         acb->common.cb(acb->common.opaque, acb->rwco.ret);
1564         blk_dec_in_flight(acb->rwco.blk);
1565         qemu_aio_unref(acb);
1566     }
1567 }

Line 1564 is what we need to move blk->in_flight down to zero, but that is never
reached! This is because of what happens at :1563

acm->common.cb()->cd_read_sector_cb()->ide_atapi_cmd_reply_end()->cd_read_sector_sync()->blk_pread()

That is, the IO callback in the atapi code itself triggers more - synchronous - IO.

In the meantime, we start processing the blk_drain() code, so by the time this
blk_pread() actually gets handled, quiesce is set, and we get stuck in the
blk_wait_while_drained().

I don't know the qemu block stack well enough to propose an actual fix.

Experimentally, waiting for ->in_flight to drop to zero *before* we quiesce in
blk_remove_bs() via an AIO_WAIT_WHILE() avoids the symptom, but I'm pretty sure
that's just a band-aid instead of fixing the deadlock.

Any suggestions/clues/thoughts?

thanks
john

next             reply	other threads:[~2023-09-18 17:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-18 17:28 John Levon [this message]
2023-09-19 10:54 ` Deadlock with SATA CD I/O and eject Kevin Wolf
2023-09-19 11:23   ` Hanna Czenczek
2023-09-19 12:57   ` John Levon
2023-09-19 14:03     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQiIxERjYmZb0v4l@movementarian.org \
    --to=levon@movementarian.org \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).