* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
@ 2025-10-06 17:21 ` Wesley Hershberger
2025-10-06 18:38 ` Bug Watch Updater
` (28 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2025-10-06 17:21 UTC (permalink / raw)
To: qemu-devel
** Bug watch added: gitlab.com/qemu-project/qemu/-/issues #3149
https://gitlab.com/qemu-project/qemu/-/issues/3149
** Also affects: qemu via
https://gitlab.com/qemu-project/qemu/-/issues/3149
Importance: Unknown
Status: Unknown
** Also affects: qemu (Ubuntu Questing)
Importance: Medium
Assignee: Wesley Hershberger (whershberger)
Status: Confirmed
** Also affects: qemu (Ubuntu Noble)
Importance: Undecided
Status: New
** Also affects: qemu (Ubuntu Jammy)
Importance: Undecided
Status: New
** Also affects: qemu (Ubuntu Plucky)
Importance: Undecided
Status: New
** Changed in: qemu (Ubuntu Jammy)
Status: New => Confirmed
** Changed in: qemu (Ubuntu Noble)
Status: New => Confirmed
** Changed in: qemu (Ubuntu Plucky)
Status: New => Confirmed
** Changed in: qemu (Ubuntu Jammy)
Importance: Undecided => Medium
** Changed in: qemu (Ubuntu Noble)
Importance: Undecided => Medium
** Changed in: qemu (Ubuntu Plucky)
Importance: Undecided => Medium
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Unknown
Status in qemu package in Ubuntu:
Confirmed
Status in qemu source package in Jammy:
Confirmed
Status in qemu source package in Noble:
Confirmed
Status in qemu source package in Plucky:
Confirmed
Status in qemu source package in Questing:
Confirmed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10. I have not yet reproduced the bug using an upstream build.
I will link the upstream bug report here as soon as I've written it.
[ Reproducer ]
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query_named_block_nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
Provision (`Ctrl + ]` after boot):
```sh
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
./blockrebase-crash n0
```
[ Details ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
2025-10-06 17:21 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
@ 2025-10-06 18:38 ` Bug Watch Updater
2025-10-07 9:38 ` Jonas Jelten
` (27 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Bug Watch Updater @ 2025-10-06 18:38 UTC (permalink / raw)
To: qemu-devel
** Changed in: qemu
Status: Unknown => New
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
New
Status in qemu package in Ubuntu:
Confirmed
Status in qemu source package in Jammy:
Confirmed
Status in qemu source package in Noble:
Confirmed
Status in qemu source package in Plucky:
Confirmed
Status in qemu source package in Questing:
Confirmed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10. I have not yet reproduced the bug using an upstream build.
I will link the upstream bug report here as soon as I've written it.
[ Reproducer ]
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query_named_block_nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
Provision (`Ctrl + ]` after boot):
```sh
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
./blockrebase-crash n0
```
[ Details ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
2025-10-06 17:21 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
2025-10-06 18:38 ` Bug Watch Updater
@ 2025-10-07 9:38 ` Jonas Jelten
2025-10-08 7:03 ` Christian Ehrhardt
` (26 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Jonas Jelten @ 2025-10-07 9:38 UTC (permalink / raw)
To: qemu-devel
** Tags added: server-todo
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
New
Status in qemu package in Ubuntu:
Confirmed
Status in qemu source package in Jammy:
Confirmed
Status in qemu source package in Noble:
Confirmed
Status in qemu source package in Plucky:
Confirmed
Status in qemu source package in Questing:
Confirmed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10. I have not yet reproduced the bug using an upstream build.
I will link the upstream bug report here as soon as I've written it.
[ Reproducer ]
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query_named_block_nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
Provision (`Ctrl + ]` after boot):
```sh
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
./blockrebase-crash n0
```
[ Details ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (2 preceding siblings ...)
2025-10-07 9:38 ` Jonas Jelten
@ 2025-10-08 7:03 ` Christian Ehrhardt
2025-10-21 20:01 ` Wesley Hershberger
` (25 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Christian Ehrhardt @ 2025-10-08 7:03 UTC (permalink / raw)
To: qemu-devel
** Tags removed: server-todo
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
New
Status in qemu package in Ubuntu:
Confirmed
Status in qemu source package in Jammy:
Confirmed
Status in qemu source package in Noble:
Confirmed
Status in qemu source package in Plucky:
Confirmed
Status in qemu source package in Questing:
Confirmed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10. I have not yet reproduced the bug using an upstream build.
I will link the upstream bug report here as soon as I've written it.
[ Reproducer ]
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query_named_block_nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
Provision (`Ctrl + ]` after boot):
```sh
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
./blockrebase-crash n0
```
[ Details ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (3 preceding siblings ...)
2025-10-08 7:03 ` Christian Ehrhardt
@ 2025-10-21 20:01 ` Wesley Hershberger
2025-11-12 18:46 ` Bug Watch Updater
` (24 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2025-10-21 20:01 UTC (permalink / raw)
To: qemu-devel
** Also affects: qemu (Ubuntu Resolute)
Importance: Medium
Assignee: Wesley Hershberger (whershberger)
Status: Confirmed
** Changed in: qemu (Ubuntu Plucky)
Assignee: (unassigned) => Wesley Hershberger (whershberger)
** Changed in: qemu (Ubuntu Noble)
Assignee: (unassigned) => Wesley Hershberger (whershberger)
** Changed in: qemu (Ubuntu Jammy)
Assignee: (unassigned) => Wesley Hershberger (whershberger)
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
New
Status in qemu package in Ubuntu:
Confirmed
Status in qemu source package in Jammy:
Confirmed
Status in qemu source package in Noble:
Confirmed
Status in qemu source package in Plucky:
Confirmed
Status in qemu source package in Questing:
Confirmed
Status in qemu source package in Resolute:
Confirmed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10. I have not yet reproduced the bug using an upstream build.
I will link the upstream bug report here as soon as I've written it.
[ Reproducer ]
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query_named_block_nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
Provision (`Ctrl + ]` after boot):
```sh
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
./blockrebase-crash n0
```
[ Details ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (4 preceding siblings ...)
2025-10-21 20:01 ` Wesley Hershberger
@ 2025-11-12 18:46 ` Bug Watch Updater
2026-01-12 12:27 ` Athos Ribeiro
` (23 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Bug Watch Updater @ 2025-11-12 18:46 UTC (permalink / raw)
To: qemu-devel
** Changed in: qemu
Status: New => Fix Released
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Confirmed
Status in qemu source package in Jammy:
Confirmed
Status in qemu source package in Noble:
Confirmed
Status in qemu source package in Plucky:
Confirmed
Status in qemu source package in Questing:
Confirmed
Status in qemu source package in Resolute:
Confirmed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10. I have not yet reproduced the bug using an upstream build.
I will link the upstream bug report here as soon as I've written it.
[ Reproducer ]
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query_named_block_nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
Provision (`Ctrl + ]` after boot):
```sh
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
./blockrebase-crash n0
```
[ Details ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (5 preceding siblings ...)
2025-11-12 18:46 ` Bug Watch Updater
@ 2026-01-12 12:27 ` Athos Ribeiro
2026-01-12 14:19 ` Wesley Hershberger
` (22 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Athos Ribeiro @ 2026-01-12 12:27 UTC (permalink / raw)
To: qemu-devel
Since we are 3 days away from the plucky EOL, should we close that task
as wontfix?
https://lists.ubuntu.com/archives/ubuntu-security-
announce/2026-January/010065.html
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Confirmed
Status in qemu source package in Jammy:
Confirmed
Status in qemu source package in Noble:
Confirmed
Status in qemu source package in Plucky:
Confirmed
Status in qemu source package in Questing:
Confirmed
Status in qemu source package in Resolute:
Confirmed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10. I have not yet reproduced the bug using an upstream build.
I will link the upstream bug report here as soon as I've written it.
[ Reproducer ]
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query_named_block_nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
Provision (`Ctrl + ]` after boot):
```sh
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
./blockrebase-crash n0
```
[ Details ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (6 preceding siblings ...)
2026-01-12 12:27 ` Athos Ribeiro
@ 2026-01-12 14:19 ` Wesley Hershberger
2026-02-09 16:19 ` Wesley Hershberger
` (21 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-01-12 14:19 UTC (permalink / raw)
To: qemu-devel
Thanks for the ping Athos; I won't be able to get to this before the
Plucky EOL.
The fix for this landed as 9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
upstream, present in v10.2.0-rc1 and released in v10.2.0.
** Changed in: qemu (Ubuntu Plucky)
Status: Confirmed => Won't Fix
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Confirmed
Status in qemu source package in Jammy:
Confirmed
Status in qemu source package in Noble:
Confirmed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Confirmed
Status in qemu source package in Resolute:
Confirmed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10. I have not yet reproduced the bug using an upstream build.
I will link the upstream bug report here as soon as I've written it.
[ Reproducer ]
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query_named_block_nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
Provision (`Ctrl + ]` after boot):
```sh
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
./blockrebase-crash n0
```
[ Details ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (7 preceding siblings ...)
2026-01-12 14:19 ` Wesley Hershberger
@ 2026-02-09 16:19 ` Wesley Hershberger
2026-02-09 16:36 ` Launchpad Bug Tracker
` (20 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-02-09 16:19 UTC (permalink / raw)
To: qemu-devel
** Description changed:
[ Impact ]
When running `block-stream` and `query-named-block-nodes` concurrently,
a null-pointer dereference causes QEMU to segfault.
+ The original reporter of this issue experienced the bug while performing
+ concurrent libvirt `virDomainBlockPull` calls on the same VM/different
+ disks. The race condition occurs at the end of the `block-stream` QMP;
+ libvirt's handler for a completed `block-stream`
+ (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-block-
+ nodes` (see "libvirt trace" below for a full trace).
+
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
- 25.10. I have not yet reproduced the bug using an upstream build.
-
- I will link the upstream bug report here as soon as I've written it.
-
- [ Reproducer ]
+ 25.10.
+
+ [1] qemuBlockJobProcessEventCompletedPull
+
+ [ Test Plan ]
+
+ ```
+ sudo apt install libvirt-daemon-system virtinst
+ ```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
- ./query_named_block_nodes.sh "${domain}" &
+ ./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
- Provision (`Ctrl + ]` after boot):
- ```sh
- wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
+ `provision.sh` (`Ctrl + ]` after boot):
+ ```sh
+ #!/bin/bash
+
+ set -ex
+
+ wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
+ server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
+ chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
+ ./provision.sh
./blockrebase-crash n0
```
- [ Details ]
+ Expected behavior: `blockrebase-crash.sh` runs until "No space left on
+ device"
+
+ Actual behavior: QEMU crashes after a few iterations:
+ ```
+ Block Pull: [81.05 %]+ bjr=
+ + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
+ ++ virsh blockjob n0 vdb
+ Block Pull: [97.87 %]+ bjr=
+ + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
+ ++ virsh blockjob n0 vdb
+ error: Unable to read from monitor: Connection reset by peer
+ error: Unable to read from monitor: Connection reset by peer
+ + bjr=
+ ++ virsh list --uuid
+ + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
+ ++ uuidgen
+ + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
+ error: Requested operation is not valid: domain is not running
+ Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ + virsh blockpull n0 vdb
+ error: Requested operation is not valid: domain is not running
+ error: Requested operation is not valid: domain is not running
+
+ wesley@nv0:~$ error: Requested operation is not valid: domain is not running
+ ```
+
+ [ Where problems could occur ]
+
+ The only codepaths affected by this change are `block-stream` and
+ `blockdev-backup` [1][2]. If the code is somehow broken, we would expect
+ to see failures when executing these QMP commands (or the libvirt APIs
+ that use them, `virDomainBlockPull` and `virDomainBackupBegin` [3][4]).
+
+ As noted in the upstream commit message, the change does cause an
+ additional flush to occur during `blockdev-backup` QMPs.
+
+ The patch that was ultimately merged upstream was a revert of most of
+ [5]. _That_ patch was a workaround for a blockdev permissions issue that
+ was later resolved in [6] (see the end of [7] and replies for upstream
+ discussion). Both [5] and [6] are present in QEMU 6.2.0, so the
+ assumptions that led us to the upstream solution hold for Jammy.
+
+ [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
+ [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
+ [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
+ [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
+ [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
+ [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
+ [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
+
+ [ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
- The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
+ The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
+
+ Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
+ should not be able to observe a NULL list of children.
+
+ `query-named-block-nodes` iterates the global list of block nodes
+ `graph_bdrv_states` [5]. The offending block node (the `cor_filter_bs`,
+ added during a `block-stream`) was removed from the list of block nodes
+ _for the disk_ when the operation finished, but not removed from the
+ global list of block nodes until later (this is the window for the
+ race). The patch keeps the block node in the disk's list until it is
+ dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
+ [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
+ [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
+
+ [ libvirt trace ]
+ `qemuBlockJobProcessEventCompletedPull` [1]
+ `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
+ `qemuBlockGetNamedNodeData` [3]
+ `qemuMonitorBlockGetNamedNodeData` [4]
+ `qemuMonitorJSONBlockGetNamedNodeData` [5]
+ `qemuMonitorJSONQueryNamedBlockNodes` [6]
+
+ [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
+ [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
+ [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
+ [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
+ [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
+ [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
** Changed in: qemu (Ubuntu Questing)
Status: Confirmed => In Progress
** Changed in: qemu (Ubuntu Noble)
Status: Confirmed => In Progress
** Changed in: qemu (Ubuntu Jammy)
Status: Confirmed => In Progress
** Changed in: qemu (Ubuntu Resolute)
Assignee: Wesley Hershberger (whershberger) => (unassigned)
** Changed in: qemu (Ubuntu Resolute)
Status: Confirmed => In Progress
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
In Progress
Status in qemu source package in Jammy:
In Progress
Status in qemu source package in Noble:
In Progress
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
In Progress
Status in qemu source package in Resolute:
In Progress
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (8 preceding siblings ...)
2026-02-09 16:19 ` Wesley Hershberger
@ 2026-02-09 16:36 ` Launchpad Bug Tracker
2026-02-18 14:31 ` Wesley Hershberger
` (19 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Launchpad Bug Tracker @ 2026-02-09 16:36 UTC (permalink / raw)
To: qemu-devel
** Merge proposal linked:
https://code.launchpad.net/~whershberger/ubuntu/+source/qemu/+git/qemu/+merge/500070
** Merge proposal linked:
https://code.launchpad.net/~whershberger/ubuntu/+source/qemu/+git/qemu/+merge/500071
** Merge proposal linked:
https://code.launchpad.net/~whershberger/ubuntu/+source/qemu/+git/qemu/+merge/500072
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
In Progress
Status in qemu source package in Jammy:
In Progress
Status in qemu source package in Noble:
In Progress
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
In Progress
Status in qemu source package in Resolute:
In Progress
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (9 preceding siblings ...)
2026-02-09 16:36 ` Launchpad Bug Tracker
@ 2026-02-18 14:31 ` Wesley Hershberger
2026-02-27 9:56 ` Timo Aaltonen
` (18 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-02-18 14:31 UTC (permalink / raw)
To: qemu-devel
** Changed in: qemu (Ubuntu Resolute)
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
In Progress
Status in qemu source package in Noble:
In Progress
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
In Progress
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (10 preceding siblings ...)
2026-02-18 14:31 ` Wesley Hershberger
@ 2026-02-27 9:56 ` Timo Aaltonen
2026-02-27 16:59 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.3) Ubuntu SRU Bot
` (17 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-02-27 9:56 UTC (permalink / raw)
To: qemu-devel
Hello Wesley, or anyone else affected,
Accepted qemu into questing-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/qemu/1:10.1.0+ds-5ubuntu2.3 in a
few hours, and then in the -proposed repository.
Please help us by testing this new package. See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed. Your feedback will aid us getting this
update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
questing to verification-done-questing. If it does not fix the bug for
you, please add a comment stating that, and change the tag to
verification-failed-questing. In either case, without details of your
testing we will not be able to proceed.
Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance for helping!
N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.
** Changed in: qemu (Ubuntu Questing)
Status: In Progress => Fix Committed
** Tags added: verification-needed verification-needed-questing
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
In Progress
Status in qemu source package in Noble:
In Progress
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.3)
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (11 preceding siblings ...)
2026-02-27 9:56 ` Timo Aaltonen
@ 2026-02-27 16:59 ` Ubuntu SRU Bot
2026-03-06 10:48 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Timo Aaltonen
` (16 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Ubuntu SRU Bot @ 2026-02-27 16:59 UTC (permalink / raw)
To: qemu-devel
All autopkgtests for the newly accepted qemu (1:10.1.0+ds-5ubuntu2.3) for questing have finished running.
The following regressions have been reported in tests triggered by the package:
architecture-properties/0.2.6 (ppc64el)
casper/25.10.2 (amd64, ppc64el)
edk2/2025.02-8ubuntu3 (amd64, armhf)
freedom-maker/0.34 (arm64)
incus/6.0.4-2 (arm64, s390x)
ironic-python-agent/10.2.0-3 (arm64)
kworkflow/20191112-1.2 (amd64)
libvirt/11.6.0-1ubuntu3.3 (arm64, ppc64el)
nova/3:32.0.0-0ubuntu1.1 (i386)
systemd/257.9-0ubuntu2.1 (armhf)
Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].
https://people.canonical.com/~ubuntu-archive/proposed-
migration/questing/update_excuses.html#qemu
[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions
Thank you!
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
In Progress
Status in qemu source package in Noble:
In Progress
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (12 preceding siblings ...)
2026-02-27 16:59 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.3) Ubuntu SRU Bot
@ 2026-03-06 10:48 ` Timo Aaltonen
2026-03-11 17:57 ` Wesley Hershberger
` (15 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-03-06 10:48 UTC (permalink / raw)
To: qemu-devel
the questing, noble and jammy uploads need to be rebased to the security
update that went out earlier this week
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
In Progress
Status in qemu source package in Noble:
In Progress
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (13 preceding siblings ...)
2026-03-06 10:48 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Timo Aaltonen
@ 2026-03-11 17:57 ` Wesley Hershberger
2026-03-12 14:58 ` Hector CAO
` (14 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-11 17:57 UTC (permalink / raw)
To: qemu-devel
Sorry for the delay on this; I've rebased the MPs, pushed new tags &
uploaded test builds to my PPAs; still waiting on RISC-V but everything
else is green.
[1] https://launchpad.net/~whershberger/+archive/ubuntu/lp2126951-updates
[2] https://launchpad.net/~whershberger/+archive/ubuntu/lp2126951-proposed
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
In Progress
Status in qemu source package in Noble:
In Progress
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (14 preceding siblings ...)
2026-03-11 17:57 ` Wesley Hershberger
@ 2026-03-12 14:58 ` Hector CAO
2026-03-12 15:10 ` Wesley Hershberger
` (13 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Hector CAO @ 2026-03-12 14:58 UTC (permalink / raw)
To: qemu-devel
Hello, Wesly,
I assume this needs an re-upload from Nick or someone else ?
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
In Progress
Status in qemu source package in Noble:
In Progress
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (15 preceding siblings ...)
2026-03-12 14:58 ` Hector CAO
@ 2026-03-12 15:10 ` Wesley Hershberger
2026-03-19 12:49 ` [Bug 2126951] Please test proposed package Timo Aaltonen
` (12 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-12 15:10 UTC (permalink / raw)
To: qemu-devel
Yes that's correct. The RISC-V test build failures look spurious (no
logs); I've restarted them.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
In Progress
Status in qemu source package in Noble:
In Progress
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Please test proposed package
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (16 preceding siblings ...)
2026-03-12 15:10 ` Wesley Hershberger
@ 2026-03-19 12:49 ` Timo Aaltonen
2026-03-19 12:51 ` Timo Aaltonen
` (11 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-03-19 12:49 UTC (permalink / raw)
To: qemu-devel
Hello Wesley, or anyone else affected,
Accepted qemu into questing-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/qemu/1:10.1.0+ds-5ubuntu2.5 in a
few hours, and then in the -proposed repository.
Please help us by testing this new package. See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed. Your feedback will aid us getting this
update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
questing to verification-done-questing. If it does not fix the bug for
you, please add a comment stating that, and change the tag to
verification-failed-questing. In either case, without details of your
testing we will not be able to proceed.
Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance for helping!
N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.
** Changed in: qemu (Ubuntu Noble)
Status: In Progress => Fix Committed
** Tags added: verification-needed-noble
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Please test proposed package
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (17 preceding siblings ...)
2026-03-19 12:49 ` [Bug 2126951] Please test proposed package Timo Aaltonen
@ 2026-03-19 12:51 ` Timo Aaltonen
2026-03-19 12:53 ` Timo Aaltonen
` (10 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-03-19 12:51 UTC (permalink / raw)
To: qemu-devel
Hello Wesley, or anyone else affected,
Accepted qemu into noble-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/qemu/1:8.2.2+ds-0ubuntu1.14 in a
few hours, and then in the -proposed repository.
Please help us by testing this new package. See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed. Your feedback will aid us getting this
update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
noble to verification-done-noble. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-noble. In either case, without details of your testing we will
not be able to proceed.
Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance for helping!
N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.
** Changed in: qemu (Ubuntu Jammy)
Status: In Progress => Fix Committed
** Tags added: verification-needed-jammy
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Please test proposed package
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (18 preceding siblings ...)
2026-03-19 12:51 ` Timo Aaltonen
@ 2026-03-19 12:53 ` Timo Aaltonen
2026-03-19 14:30 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Michael Tokarev
` (9 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-03-19 12:53 UTC (permalink / raw)
To: qemu-devel
Hello Wesley, or anyone else affected,
Accepted qemu into jammy-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/qemu/1:6.2+dfsg-2ubuntu6.29 in a
few hours, and then in the -proposed repository.
Please help us by testing this new package. See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed. Your feedback will aid us getting this
update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
jammy to verification-done-jammy. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-jammy. In either case, without details of your testing we will
not be able to proceed.
Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance for helping!
N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (19 preceding siblings ...)
2026-03-19 12:53 ` Timo Aaltonen
@ 2026-03-19 14:30 ` Michael Tokarev
2026-03-19 14:37 ` Wesley Hershberger
` (8 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Michael Tokarev @ 2026-03-19 14:30 UTC (permalink / raw)
To: qemu-devel
Shouldn't this bug and fix be reflected in the upstream qemu too?
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (20 preceding siblings ...)
2026-03-19 14:30 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Michael Tokarev
@ 2026-03-19 14:37 ` Wesley Hershberger
2026-03-19 20:18 ` Michael Tokarev
` (7 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-19 14:37 UTC (permalink / raw)
To: qemu-devel
Hey Michael,
The fix for this landed as 9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
upstream, present in v10.2.0-rc1 and released in v10.2.0. I'll add the
upstream link to the bug description.
** Description changed:
[ Impact ]
When running `block-stream` and `query-named-block-nodes` concurrently,
a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while performing
concurrent libvirt `virDomainBlockPull` calls on the same VM/different
disks. The race condition occurs at the end of the `block-stream` QMP;
libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-block-
nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
- virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
+ virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
- echo "Missing domain name"
- exit 1
+ echo "Missing domain name"
+ exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
- snap="snap0-$(uuidgen)"
-
- virsh snapshot-create-as "${domain}" \
- --name "${snap}" \
- --disk-only file= \
- --diskspec vda,snapshot=no \
- --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
- --atomic \
- --no-metadata
-
- virsh blockpull "${domain}" vdb
-
- while bjr=$(virsh blockjob "$domain" vdb); do
- if [[ "$bjr" == *"No current block job for"* ]] ; then
- break;
- fi;
- done;
+ snap="snap0-$(uuidgen)"
+
+ virsh snapshot-create-as "${domain}" \
+ --name "${snap}" \
+ --disk-only file= \
+ --diskspec vda,snapshot=no \
+ --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
+ --atomic \
+ --no-metadata
+
+ virsh blockpull "${domain}" vdb
+
+ while bjr=$(virsh blockjob "$domain" vdb); do
+ if [[ "$bjr" == *"No current block job for"* ]] ; then
+ break;
+ fi;
+ done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
- -n n0 \
- --description "Test noble minimal" \
- --os-variant=ubuntu24.04 \
- --ram=1024 --vcpus=2 \
- --import \
- --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
- --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
- --graphics none \
- --network network=default \
- --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
+ -n n0 \
+ --description "Test noble minimal" \
+ --os-variant=ubuntu24.04 \
+ --ram=1024 --vcpus=2 \
+ --import \
+ --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
+ --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
+ --graphics none \
+ --network network=default \
+ --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would expect
to see failures when executing these QMP commands (or the libvirt APIs
that use them, `virDomainBlockPull` and `virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue that
was later resolved in [6] (see the end of [7] and replies for upstream
discussion). Both [5] and [6] are present in QEMU 6.2.0, so the
assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
- at block/qapi.c:62
+ at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
- at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
+ at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
- errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
+ errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
- at qapi/qapi-commands-block-core.c:553
+ at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
- user_data=<optimized out>) at util/async.c:361
+ user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the `cor_filter_bs`,
added during a `block-stream`) was removed from the list of block nodes
_for the disk_ when the operation finished, but not removed from the
global list of block nodes until later (this is the window for the
race). The patch keeps the block node in the disk's list until it is
dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
+ [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (21 preceding siblings ...)
2026-03-19 14:37 ` Wesley Hershberger
@ 2026-03-19 20:18 ` Michael Tokarev
2026-03-20 2:23 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.5) Ubuntu SRU Bot
` (6 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Michael Tokarev @ 2026-03-19 20:18 UTC (permalink / raw)
To: qemu-devel
Ah, ok, I didn't look that far, thinking it should be a recent change.
If that's the case, this commit should be picked up for qemu 10.0.x
series. It's interesting I missed it during 10.2 RCs.
Thanks!
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.5)
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (22 preceding siblings ...)
2026-03-19 20:18 ` Michael Tokarev
@ 2026-03-20 2:23 ` Ubuntu SRU Bot
2026-03-20 3:11 ` [Bug 2126951] Autopkgtest regression report (qemu/1:8.2.2+ds-0ubuntu1.14) Ubuntu SRU Bot
` (5 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Ubuntu SRU Bot @ 2026-03-20 2:23 UTC (permalink / raw)
To: qemu-devel
All autopkgtests for the newly accepted qemu (1:10.1.0+ds-5ubuntu2.5) for questing have finished running.
The following regressions have been reported in tests triggered by the package:
freedom-maker/0.34 (armhf)
multipath-tools/0.11.1-3ubuntu2 (ppc64el)
nova/unknown (ppc64el)
qemu/1:10.1.0+ds-5ubuntu2.5 (armhf)
systemd/257.9-0ubuntu2.1 (armhf)
Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].
https://people.canonical.com/~ubuntu-archive/proposed-
migration/questing/update_excuses.html#qemu
[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions
Thank you!
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Autopkgtest regression report (qemu/1:8.2.2+ds-0ubuntu1.14)
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (23 preceding siblings ...)
2026-03-20 2:23 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.5) Ubuntu SRU Bot
@ 2026-03-20 3:11 ` Ubuntu SRU Bot
2026-03-20 5:12 ` [Bug 2126951] Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.29) Ubuntu SRU Bot
` (4 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Ubuntu SRU Bot @ 2026-03-20 3:11 UTC (permalink / raw)
To: qemu-devel
All autopkgtests for the newly accepted qemu (1:8.2.2+ds-0ubuntu1.14) for noble have finished running.
The following regressions have been reported in tests triggered by the package:
cryptsetup/2:2.7.0-1ubuntu4.2 (s390x)
freedom-maker/0.33 (armhf)
glance/2:28.1.0-0ubuntu1.2 (armhf)
glib2.0/2.80.0-6ubuntu3.8 (ppc64el, s390x)
glib2.0/unknown (armhf)
Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].
https://people.canonical.com/~ubuntu-archive/proposed-
migration/noble/update_excuses.html#qemu
[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions
Thank you!
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.29)
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (24 preceding siblings ...)
2026-03-20 3:11 ` [Bug 2126951] Autopkgtest regression report (qemu/1:8.2.2+ds-0ubuntu1.14) Ubuntu SRU Bot
@ 2026-03-20 5:12 ` Ubuntu SRU Bot
2026-03-26 19:56 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
` (3 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Ubuntu SRU Bot @ 2026-03-20 5:12 UTC (permalink / raw)
To: qemu-devel
All autopkgtests for the newly accepted qemu (1:6.2+dfsg-2ubuntu6.29) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:
livecd-rootfs/2.765.55 (amd64, ppc64el)
nova/unknown (ppc64el)
systemd/249.11-0ubuntu3.17 (armhf)
systemd/unknown (ppc64el)
Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].
https://people.canonical.com/~ubuntu-archive/proposed-
migration/jammy/update_excuses.html#qemu
[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions
Thank you!
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Committed
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Committed
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (25 preceding siblings ...)
2026-03-20 5:12 ` [Bug 2126951] Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.29) Ubuntu SRU Bot
@ 2026-03-26 19:56 ` Wesley Hershberger
2026-03-26 19:56 ` Wesley Hershberger
` (2 subsequent siblings)
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-26 19:56 UTC (permalink / raw)
To: qemu-devel
The crash is reproducible with the current version:
wesley@qv0:~$ apt policy qemu-system-x86
qemu-system-x86:
Installed: 1:10.1.0+ds-5ubuntu2.4
Candidate: 1:10.1.0+ds-5ubuntu2.4
Version table:
*** 1:10.1.0+ds-5ubuntu2.4 500
500 http://archive.ubuntu.com/ubuntu questing-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu questing-security/main amd64 Packages
100 /var/lib/dpkg/status
1:10.1.0+ds-5ubuntu2 500
500 http://archive.ubuntu.com/ubuntu questing/main amd64 Packages
### Verfication done Questing ###
wesley@qv0:~$ apt policy qemu-system-x86
qemu-system-x86:
Installed: 1:10.1.0+ds-5ubuntu2.5
Candidate: 1:10.1.0+ds-5ubuntu2.5
Version table:
*** 1:10.1.0+ds-5ubuntu2.5 100
100 http://archive.ubuntu.com/ubuntu questing-proposed/main amd64 Packages
100 /var/lib/dpkg/status
1:10.1.0+ds-5ubuntu2.4 500
500 http://archive.ubuntu.com/ubuntu questing-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu questing-security/main amd64 Packages
1:10.1.0+ds-5ubuntu2 500
500 http://archive.ubuntu.com/ubuntu questing/main amd64 Packages
wesley@qv0:~$ ./provision.sh
+ wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
--2026-03-26 08:56:59-- https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
Resolving cloud-images.ubuntu.com (cloud-images.ubuntu.com)... 185.125.190.40, 185.125.190.37, 2620:2d:4000:1::17, ...
Connecting to cloud-images.ubuntu.com (cloud-images.ubuntu.com)|185.125.190.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 629987328 (601M) [application/octet-stream]
Saving to: ‘noble-server-cloudimg-amd64.img.1’
noble-server-cloudimg-amd64.img.1
100%[===================================================================>]
600.80M 1.60MB/s in 21s
2026-03-26 08:57:21 (28.4 MB/s) - ‘noble-server-cloudimg-amd64.img.1’
saved [629987328/629987328]
+ sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
+ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
Formatting '/var/lib/libvirt/images/n0-blk0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
+ touch network-config
+ touch meta-data
+ touch user-data
+ virt-install -n n0 --description 'Test noble minimal' --os-variant=ubuntu24.04 --ram=1024 --vcpus=2 --import --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 --graphics none --network network=default --cloud-init user-data=user-data,meta-data=meta-data,network-config=network-config
...
wesley@qv0:~$ virsh list
Id Name State
----------------------
2 n0 running
wesley@qv0:~$ ./blockrebase-crash.sh n0
...
My connection to the testbed died about 5 hours in to the test; it
filled up 10GB with snapshots and didn't crash:
wesley@qv0:~$ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 391M 1.4M 389M 1% /run
efivarfs 256K 44K 208K 18% /sys/firmware/efi/efivars
/dev/sda1 19G 11G 8.3G 55% /
wesley@qv0:~$ virsh list
Id Name State
----------------------
2 n0 running
### Verfication done Questing ###
** Tags removed: verification-needed-questing
** Tags added: verification-done-questing
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Released
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Released
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (26 preceding siblings ...)
2026-03-26 19:56 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
@ 2026-03-26 19:56 ` Wesley Hershberger
2026-03-26 19:57 ` Wesley Hershberger
2026-03-26 19:57 ` Wesley Hershberger
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-26 19:56 UTC (permalink / raw)
To: qemu-devel
The crash is reproducible with the current version:
wesley@nv0:~$ apt policy qemu-system-x86
qemu-system-x86:
Installed: 1:8.2.2+ds-0ubuntu1.13
Candidate: 1:8.2.2+ds-0ubuntu1.13
Version table:
*** 1:8.2.2+ds-0ubuntu1.13 500
500 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu noble-security/main amd64 Packages
100 /var/lib/dpkg/status
1:8.2.2+ds-0ubuntu1 500
500 http://archive.ubuntu.com/ubuntu noble/main amd64 Packages
### Verification done Noble ###
wesley@nv0:~$ apt policy qemu-system-x86
qemu-system-x86:
Installed: 1:8.2.2+ds-0ubuntu1.14
Candidate: 1:8.2.2+ds-0ubuntu1.14
Version table:
*** 1:8.2.2+ds-0ubuntu1.14 100
100 http://archive.ubuntu.com/ubuntu noble-proposed/main amd64 Packages
100 /var/lib/dpkg/status
1:8.2.2+ds-0ubuntu1.13 500
500 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu noble-security/main amd64 Packages
1:8.2.2+ds-0ubuntu1 500
500 http://archive.ubuntu.com/ubuntu noble/main amd64 Packages
wesley@nv0:~$ ./provision.sh
+ wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
--2026-03-26 08:56:55-- https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
Resolving cloud-images.ubuntu.com (cloud-images.ubuntu.com)... 185.125.190.37, 185.125.190.40, 2620:2d:4000:1::1a, ...
Connecting to cloud-images.ubuntu.com (cloud-images.ubuntu.com)|185.125.190.37|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 629987328 (601M) [application/octet-stream]
Saving to: ‘noble-server-cloudimg-amd64.img.1’
noble-server-cloudimg-amd64.img.1
100%[====================================================================>]
600.80M 26.4MB/s in 38s
2026-03-26 08:57:34 (16.0 MB/s) - ‘noble-server-cloudimg-amd64.img.1’
saved [629987328/629987328]
+ sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
+ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
Formatting '/var/lib/libvirt/images/n0-blk0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
+ touch network-config
+ touch meta-data
+ touch user-data
+ virt-install -n n0 --description 'Test noble minimal' --os-variant=ubuntu24.04 --ram=1024 --vcpus=2 --import --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 --graphics none --network network=default --cloud-init user-data=user-data,meta-data=meta-data,network-config=network-config
...
wesley@nv0:~$ virsh list
Id Name State
----------------------
1 n0 running
wesley@nv0:~$ ./blockrebase-crash.sh n0
...
My connection to the testbed died about 5 hours in to the test; it
filled up 8GB with snapshots and didn't crash:
wesley@nv0:~$ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 391M 1.3M 390M 1% /run
efivarfs 256K 44K 208K 18% /sys/firmware/efi/efivars
/dev/sda1 19G 9.8G 8.6G 54% /
...
wesley@nv0:~$ virsh list
Id Name State
----------------------
1 n0 running
### Verification done Noble ###
** Tags removed: verification-needed-noble
** Tags added: verification-done-noble
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Released
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Released
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (27 preceding siblings ...)
2026-03-26 19:56 ` Wesley Hershberger
@ 2026-03-26 19:57 ` Wesley Hershberger
2026-03-26 19:57 ` Wesley Hershberger
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-26 19:57 UTC (permalink / raw)
To: qemu-devel
The crash is reproducible with the current version:
wesley@jv0:~$ apt policy qemu-system-x86
qemu-system-x86:
Installed: 1:6.2+dfsg-2ubuntu6.28
Candidate: 1:6.2+dfsg-2ubuntu6.28
Version table:
*** 1:6.2+dfsg-2ubuntu6.28 500
500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
100 /var/lib/dpkg/status
1:6.2+dfsg-2ubuntu6 500
500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
### Verification done Jammy ###
wesley@jv0:~$ apt policy qemu-system-x86
qemu-system-x86:
Installed: 1:6.2+dfsg-2ubuntu6.29
Candidate: 1:6.2+dfsg-2ubuntu6.29
Version table:
*** 1:6.2+dfsg-2ubuntu6.29 500
500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
100 /var/lib/dpkg/status
1:6.2+dfsg-2ubuntu6.28 500
500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
1:6.2+dfsg-2ubuntu6 500
500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
wesley@jv0:~$ ./provision.sh
+ wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
--2026-03-26 08:56:52-- https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
Resolving cloud-images.ubuntu.com (cloud-images.ubuntu.com)... 185.125.190.37, 185.125.190.40, 2620:2d:4000:1::1a, ...
Connecting to cloud-images.ubuntu.com (cloud-images.ubuntu.com)|185.125.190.37|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 629987328 (601M) [application/octet-stream]
Saving to: ‘noble-server-cloudimg-amd64.img.1’
noble-server-cloudimg-amd64.img.1
100%[=========================================================================>]
600.80M 22.0MB/s in 42s
2026-03-26 08:57:35 (14.2 MB/s) - ‘noble-server-cloudimg-amd64.img.1’
saved [629987328/629987328]
+ sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
+ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
Formatting '/var/lib/libvirt/images/n0-blk0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
+ touch network-config
+ touch meta-data
+ touch user-data
+ virt-install -n n0 --description 'Test noble minimal' --os-variant=ubuntu24.04 --ram=1024 --vcpus=2 --import --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 --graphics none --network network=default --cloud-init user-data=user-data,meta-data=meta-data,network-config=network-config
...
wesley@jv0:~$ virsh list
Id Name State
----------------------
2 n0 running
wesley@jv0:~$ ./blockrebase-crash.sh n0
...
My connection to the testbed died about 5 hours in to the test; it
filled up 8GB with snapshots and didn't crash:
wesley@jv0:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 20G 9.0G 11G 47% /
wesley@jv0:~$ virsh list
Id Name State
----------------------
2 n0 running
### Verification done Jammy ###
** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Released
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Released
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
` (28 preceding siblings ...)
2026-03-26 19:57 ` Wesley Hershberger
@ 2026-03-26 19:57 ` Wesley Hershberger
29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-26 19:57 UTC (permalink / raw)
To: qemu-devel
This bug was fixed in qemu 1:10.2.1+ds-1ubuntu2 is Resolute.
** Changed in: qemu (Ubuntu Resolute)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951
Title:
`block-stream` segfault with concurrent `query-named-block-nodes`
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Released
Status in qemu source package in Jammy:
Fix Committed
Status in qemu source package in Noble:
Fix Committed
Status in qemu source package in Plucky:
Won't Fix
Status in qemu source package in Questing:
Fix Committed
Status in qemu source package in Resolute:
Fix Released
Bug description:
[ Impact ]
When running `block-stream` and `query-named-block-nodes`
concurrently, a null-pointer dereference causes QEMU to segfault.
The original reporter of this issue experienced the bug while
performing concurrent libvirt `virDomainBlockPull` calls on the same
VM/different disks. The race condition occurs at the end of the
`block-stream` QMP; libvirt's handler for a completed `block-stream`
(`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
block-nodes` (see "libvirt trace" below for a full trace).
This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
25.10.
[1] qemuBlockJobProcessEventCompletedPull
[ Test Plan ]
```
sudo apt install libvirt-daemon-system virtinst
```
In `query-named-block-nodes.sh`:
```sh
#!/bin/bash
while true; do
virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done
```
In `blockrebase-crash.sh`:
```sh
#!/bin/bash
set -ex
domain="$1"
if [ -z "${domain}" ]; then
echo "Missing domain name"
exit 1
fi
./query-named-block-nodes.sh "${domain}" &
query_pid=$!
while [ -n "$(virsh list --uuid)" ]; do
snap="snap0-$(uuidgen)"
virsh snapshot-create-as "${domain}" \
--name "${snap}" \
--disk-only file= \
--diskspec vda,snapshot=no \
--diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
--atomic \
--no-metadata
virsh blockpull "${domain}" vdb
while bjr=$(virsh blockjob "$domain" vdb); do
if [[ "$bjr" == *"No current block job for"* ]] ; then
break;
fi;
done;
done
kill "${query_pid}"
```
`provision.sh` (`Ctrl + ]` after boot):
```sh
#!/bin/bash
set -ex
wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
server-cloudimg-amd64.img
sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
touch network-config
touch meta-data
touch user-data
virt-install \
-n n0 \
--description "Test noble minimal" \
--os-variant=ubuntu24.04 \
--ram=1024 --vcpus=2 \
--import \
--disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
--disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
--graphics none \
--network network=default \
--cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
```
And run the script to cause the crash (you may need to manually kill
query-named-block-jobs.sh):
```sh
chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
./provision.sh
./blockrebase-crash n0
```
Expected behavior: `blockrebase-crash.sh` runs until "No space left on
device"
Actual behavior: QEMU crashes after a few iterations:
```
Block Pull: [81.05 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
Block Pull: [97.87 %]+ bjr=
+ [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
++ virsh blockjob n0 vdb
error: Unable to read from monitor: Connection reset by peer
error: Unable to read from monitor: Connection reset by peer
+ bjr=
++ virsh list --uuid
+ '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
++ uuidgen
+ snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
error: Requested operation is not valid: domain is not running
Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ virsh blockpull n0 vdb
error: Requested operation is not valid: domain is not running
error: Requested operation is not valid: domain is not running
wesley@nv0:~$ error: Requested operation is not valid: domain is not running
```
[ Where problems could occur ]
The only codepaths affected by this change are `block-stream` and
`blockdev-backup` [1][2]. If the code is somehow broken, we would
expect to see failures when executing these QMP commands (or the
libvirt APIs that use them, `virDomainBlockPull` and
`virDomainBackupBegin` [3][4]).
As noted in the upstream commit message, the change does cause an
additional flush to occur during `blockdev-backup` QMPs.
The patch that was ultimately merged upstream was a revert of most of
[5]. _That_ patch was a workaround for a blockdev permissions issue
that was later resolved in [6] (see the end of [7] and replies for
upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
the assumptions that led us to the upstream solution hold for Jammy.
[1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
[2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
[3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
[4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
[5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
[6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
[7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
[ Other info ]
Backtrace from the coredump (source at [1]):
```
#0 bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1 0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
at block/qapi.c:62
#2 0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3 0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4 qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
at qapi/qapi-commands-block-core.c:553
#5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7 0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8 0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>) at util/async.c:361
#9 0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
```
The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
- `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
- `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
should not be able to observe a NULL list of children.
`query-named-block-nodes` iterates the global list of block nodes
`graph_bdrv_states` [5]. The offending block node (the
`cor_filter_bs`, added during a `block-stream`) was removed from the
list of block nodes _for the disk_ when the operation finished, but
not removed from the global list of block nodes until later (this is
the window for the race). The patch keeps the block node in the disk's
list until it is dropped at the end of the blockjob.
[1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
[2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
[3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
[4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
[6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
[ libvirt trace ]
`qemuBlockJobProcessEventCompletedPull` [1]
`qemuBlockJobProcessEventCompletedPullBitmaps` [2]
`qemuBlockGetNamedNodeData` [3]
`qemuMonitorBlockGetNamedNodeData` [4]
`qemuMonitorJSONBlockGetNamedNodeData` [5]
`qemuMonitorJSONQueryNamedBlockNodes` [6]
[1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
[2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
[3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
[4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
[5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
[6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2026-03-26 20:07 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
2025-10-06 17:21 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
2025-10-06 18:38 ` Bug Watch Updater
2025-10-07 9:38 ` Jonas Jelten
2025-10-08 7:03 ` Christian Ehrhardt
2025-10-21 20:01 ` Wesley Hershberger
2025-11-12 18:46 ` Bug Watch Updater
2026-01-12 12:27 ` Athos Ribeiro
2026-01-12 14:19 ` Wesley Hershberger
2026-02-09 16:19 ` Wesley Hershberger
2026-02-09 16:36 ` Launchpad Bug Tracker
2026-02-18 14:31 ` Wesley Hershberger
2026-02-27 9:56 ` Timo Aaltonen
2026-02-27 16:59 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.3) Ubuntu SRU Bot
2026-03-06 10:48 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Timo Aaltonen
2026-03-11 17:57 ` Wesley Hershberger
2026-03-12 14:58 ` Hector CAO
2026-03-12 15:10 ` Wesley Hershberger
2026-03-19 12:49 ` [Bug 2126951] Please test proposed package Timo Aaltonen
2026-03-19 12:51 ` Timo Aaltonen
2026-03-19 12:53 ` Timo Aaltonen
2026-03-19 14:30 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Michael Tokarev
2026-03-19 14:37 ` Wesley Hershberger
2026-03-19 20:18 ` Michael Tokarev
2026-03-20 2:23 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.5) Ubuntu SRU Bot
2026-03-20 3:11 ` [Bug 2126951] Autopkgtest regression report (qemu/1:8.2.2+ds-0ubuntu1.14) Ubuntu SRU Bot
2026-03-20 5:12 ` [Bug 2126951] Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.29) Ubuntu SRU Bot
2026-03-26 19:56 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
2026-03-26 19:56 ` Wesley Hershberger
2026-03-26 19:57 ` Wesley Hershberger
2026-03-26 19:57 ` Wesley Hershberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox