From: Hanna Reitz <hreitz@redhat.com>
To: qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>,
qemu-devel@nongnu.org
Subject: [PATCH] block: Fix BB.root changing across bdrv_next()
Date: Tue, 1 Mar 2022 18:39:14 +0100 [thread overview]
Message-ID: <20220301173914.12279-1-hreitz@redhat.com> (raw)
bdrv_next() has no guarantee that its caller has stopped all block graph
operations; for example, bdrv_flush_all() does not.
The latter can actually provoke such operations, because its
bdrv_flush() call, which runs a coroutine (bdrv_co_flush()), may run
this coroutine in a different AioContext than the main one, and then
when this coroutine is done and invokes aio_wait_kick(), the monitor may
get a chance to run and start executing some graph-modifying QMP
command.
One example for this is when the VM encounters an I/O error on a block
device and stops, triggering a bdrv_flush_all(), and a blockdev-mirror
is started simultaneously on a block node in an I/O thread. When
bdrv_flush_all() comes to that node[1] and flushes it, the
aio_wait_kick() at the end of bdrv_co_flush_entry() may cause the
monitor to process the mirror request, and mirror_start_job() will then
replace the node by a mirror filter node, before bdrv_flush_all()
resumes and can invoke bdrv_next() again to continue iterating.
[1] Say there is a BlockBackend on top of the node in question, and so
bdrv_next() finds that BB and returns the node as the BB's blk_bs().
bdrv_next() will bdrv_ref() the node such that it remains valid through
bdrv_flush_all()'s iteration, and drop the reference when it is called
the next time.
The problem is that bdrv_next() does not store to which BDS it retains a
strong reference when the BDS is a BB's child, so on the subsequent
call, it will just invoke blk_bs() again and bdrv_unref() the returned
node -- but as the example above shows, this node might be a different
one than the one that was bdrv_ref()-ed before. This can lead to a
use-after-free (for the mirror filter node in our example), because this
negligent bdrv_unref() would steal a strong reference from someone else.
We can solve this problem by always storing the returned (and strongly
referenced) BDS in BdrvNextIterator.bs. When we want to drop the strong
reference of a BDS previously returned, always drop BdrvNextIterator.bs
instead of using other ways of trying to figure out what that BDS was
that we returned last time.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2058457
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
Sadly, I didn't find a nice way to test this, primarily because I
believe reproducing this requires a bdrv_flush_all() to come from the
VM (without QMP command). The only way I know that this can happen is
when the VM encounters an I/O error and responds with stopping the
guest.
It's also anything but easily reproducible, and I can't think of a way
to improve on that, because it really seems to be based on chance
whether the aio_wait_kick() wakes up the monitor and has it process an
incoming QMP command.
---
block/block-backend.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index 4ff6b4d785..c822f257dc 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -575,7 +575,7 @@ BlockBackend *blk_next(BlockBackend *blk)
* the monitor or attached to a BlockBackend */
BlockDriverState *bdrv_next(BdrvNextIterator *it)
{
- BlockDriverState *bs, *old_bs;
+ BlockDriverState *bs, *old_bs = it->bs;
/* Must be called from the main loop */
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
@@ -586,8 +586,6 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
if (it->phase == BDRV_NEXT_BACKEND_ROOTS) {
BlockBackend *old_blk = it->blk;
- old_bs = old_blk ? blk_bs(old_blk) : NULL;
-
do {
it->blk = blk_all_next(it->blk);
bs = it->blk ? blk_bs(it->blk) : NULL;
@@ -601,11 +599,12 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
if (bs) {
bdrv_ref(bs);
bdrv_unref(old_bs);
+ it->bs = bs;
return bs;
}
it->phase = BDRV_NEXT_MONITOR_OWNED;
- } else {
- old_bs = it->bs;
+ /* Start from the first monitor-owned BDS */
+ it->bs = NULL;
}
/* Then return the monitor-owned BDSes without a BB attached. Ignore all
--
2.34.1
next reply other threads:[~2022-03-01 17:41 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-01 17:39 Hanna Reitz [this message]
2022-03-11 15:20 ` [PATCH] block: Fix BB.root changing across bdrv_next() Hanna Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220301173914.12279-1-hreitz@redhat.com \
--to=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).