From: Kevin Wolf <kwolf@redhat.com>
To: Wesley Hershberger <wesley.hershberger@canonical.com>
Cc: qemu-devel@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
qemu-block@nongnu.org,
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Subject: Re: [PATCH] block: Add missing null checks during query-named-block-nodes
Date: Mon, 27 Oct 2025 10:55:16 +0100 [thread overview]
Message-ID: <aP9BhBEUSuM0ougc@redhat.com> (raw)
In-Reply-To: <20251024-second-fix-3149-v1-1-d997fa3d5ce2@canonical.com>
Am 24.10.2025 um 20:07 hat Wesley Hershberger geschrieben:
> Some operations insert an implicit node above the top node in the block
> graph (e.g. block-stream or blockdev-backup). The implicit node is
> removed when the operation finishes. If QMP query-named-block-nodes is
> called while the operation is removing the implicit node, it may observe
> a node with no children, causing a segfault.
How can a QMP command run in the middle of removing a node? Is this the
real bug?
You say block-stream is affected, so let's look at stream.c. The
interesting part here is bdrv_cor_filter_drop(s->cor_filter_bs).
bdrv_drop_filter() calls bdrv_drained_begin() and bdrv_graph_wrlock(),
which can run arbitrary callbacks, but not QMP commands. It's too eary
anyway, the filter is still in the graph at this point.
Between bdrv_replace_node_common(), which removes the node from its
parent, and bdrv_unref(cor_filter_bs), I don't see any place that could
run a QMP command.
Does cor_filter_bs have a refcount > 1 before running stream_prepare()
or stream_clean()?
Aha, your previous commit message [1] is actually clearer on this:
The cor_filter_bs was added to the blockjob as the main BDS (by
passing it to block_job_create), so bdrv_cor_filter_drop doesn't
actually remove it from global_bdrv_states.
[1] https://patchew.org/QEMU/20251021-fix-3149-v2-1-5ffbe701e964@canonical.com/
So we _are_ creating a state in which cor_filter_bs still exists, but
doesn't have a child any more. Which is rather untypical for a filter
(in fact, it's against the definition of a filter). And your two
different patches address this from two different angles:
1. Don't even let this situation arise. We need to make sure that
cor_filter_bs never exists without a child - or at least, that it's
happening only for a short time while we know that no other code is
running. This is what your previous patch attempted.
2. Make sure that everything else in QEMU can deal with a filter node
that doesn't have a child. This is what this one does.
Hanna, do you have an opinion on these two options?
I'm not sure myself, but I see that both aren't mutually exclusive. I
would say that having 1. is certainly a good thing that makes everything
else simpler and less likely to fail, so that's the one I would take in
any case. I'm not completely sure if that means v2 of "stream: Remove
bdrv from job in .clean()", or if another version that just removes this
one node from the job would be better. We do have bdrv_drain_all_begin()
later in stream_prepare(), which must be assumed to run any sorts of
callbacks, so removing the node in stream_clean() might be too late in
some cases.
And then we could think of having this patch for 2., probably split
into two patches - though what tells us that this is complete? I would
be surprised if there aren't more places in QEMU that assume that a
filter node has exactly one child. So I think in this case we would have
to audit the rest of the block layer to make sure we caught all of them.
Hm, I think I am relatively sure now actually that 2. is a bad idea...
So what if we don't actually do 2., but then add an assertion to
bdrv_cor_filter_drop() that verifies that the refcount is 1 before
dropping the filter node? This should help us make sure that the patch
for 1. actually does what it's supposed to do.
> This is hypothesized to only affect the block-stream operation as other
> operations use the workaround bdc4c4c5e372756a5ba3fb3a61e585b02f0dd7f4
> or do not detach their children during cleanup (see
> 3108a15cf09865456d499b08fe14e3dbec4ccbb3).
>
> This backtrace was observed in #3149 on a relatively recent build. The
> bs passed to bdrv_refresh_filename is the cor_filter_bs from the
> block-stream operation; bs->implicit was "true".
>
> 0 bdrv_refresh_filename (bs=0x5efed72f8350)
> at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
> 1 0x00005efea73cf9dc in bdrv_block_device_info
> (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
> at block/qapi.c:62
> 2 0x00005efea7391ed3 in bdrv_named_nodes_list
> (flat=<optimized out>, errp=0x7ffeb829ebd8)
> at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
> 3 0x00005efea7471993 in qmp_query_named_block_nodes
> (has_flat=<optimized out>, flat=<optimized out>, errp=0x7ffeb829ebd8)
> at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
> 4 qmp_marshal_query_named_block_nodes
> (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
> at qapi/qapi-commands-block-core.c:553
> 5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0)
> at qapi/qmp-dispatch.c:128
> 6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430)
> at util/async.c:219
> ...
This is one change...
> The get_allocated_file_size change resolves a second segfault after the
> first was resolved. Here, bdrv_filter_bs returns NULL as the
> bs (cor_filter_bs) has no children:
>
> 0 bdrv_co_get_allocated_file_size (bs=<optimized out>)
> at /usr/src/qemu-1:10.1.0+ds-5ubuntu2+test8/b/qemu/block.c:6018
> 1 0x0000631d078522be in bdrv_co_get_allocated_file_size_entry
> (opaque=0x7ffd375c5dd0) at block/block-gen.c:288
> 2 0x0000631d07929ec2 in coroutine_trampoline
> (i0=<optimized out>, i1=<optimized out>)
> at util/coroutine-ucontext.c:175
> ...
...and this is logically a separate one.
So if we were to go down this route, I think we would have a full patch
serie to make filter nodes without a child safe. These two parts would
be separate patches, and I'm almost sure that we would get more patches
for the series to make it fully safe everywhere.
As I said above, I think this is too hard to get correct to be a good
idea, though.
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3149
> Buglink: https://bugs.launchpad.net/bugs/2126951
> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
> ---
> As discussed in the previous thread:
> https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg05458.html
>
> This resolves the issue with my reproducer.
>
> make check-block passes locally.
>
> Please let me know if any adjustments to the patch are needed.
>
> Thanks for the quick & helpful reviews!
Sorry, Wesley, that this turns out to be a bit more complicated! We'll
probably need some further discussion before we can know if or which
adjustments to the patch are needed.
Before I send this, I just had another thought: Why does copy-on-read
even drop the child in bdrv_cor_filter_drop()? Wouldn't the normal
mode of operation be that for a short time you have both the cor node
and its parent point to the same child node, and that would just work?
I see commit messages about conflicting node permissions (3108a15cf09 by
Vladimir), but I don't understand this. A filter without parents
shouldn't try to take any permissions.
So another option for never letting the situation arise would be that
cor_filter_bs just keeps its child until it's really deleted.
Vladimir, do you remember what the specific problem was?
Kevin
next prev parent reply other threads:[~2025-10-27 9:56 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-24 18:07 [PATCH] block: Add missing null checks during query-named-block-nodes Wesley Hershberger
2025-10-25 8:27 ` Vladimir Sementsov-Ogievskiy
2025-10-27 9:55 ` Kevin Wolf [this message]
2025-10-27 16:04 ` Wesley Hershberger
2025-10-27 18:45 ` Kevin Wolf
2025-10-27 20:42 ` Vladimir Sementsov-Ogievskiy
2025-10-27 20:55 ` Vladimir Sementsov-Ogievskiy
2025-10-27 19:44 ` Vladimir Sementsov-Ogievskiy
2025-10-28 10:29 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aP9BhBEUSuM0ougc@redhat.com \
--to=kwolf@redhat.com \
--cc=hreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@yandex-team.ru \
--cc=wesley.hershberger@canonical.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.