[Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`

public inbox for qemu-devel@nongnu.org
 help / color / mirror / Atom feed

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
@ 2025-10-06 17:21 ` Wesley Hershberger
  2025-10-06 18:38 ` Bug Watch Updater
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2025-10-06 17:21 UTC (permalink / raw)
  To: qemu-devel

** Bug watch added: gitlab.com/qemu-project/qemu/-/issues #3149
   https://gitlab.com/qemu-project/qemu/-/issues/3149

** Also affects: qemu via
   https://gitlab.com/qemu-project/qemu/-/issues/3149
   Importance: Unknown
       Status: Unknown

** Also affects: qemu (Ubuntu Questing)
   Importance: Medium
     Assignee: Wesley Hershberger (whershberger)
       Status: Confirmed

** Also affects: qemu (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: qemu (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Also affects: qemu (Ubuntu Plucky)
   Importance: Undecided
       Status: New

** Changed in: qemu (Ubuntu Jammy)
       Status: New => Confirmed

** Changed in: qemu (Ubuntu Noble)
       Status: New => Confirmed

** Changed in: qemu (Ubuntu Plucky)
       Status: New => Confirmed

** Changed in: qemu (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: qemu (Ubuntu Noble)
   Importance: Undecided => Medium

** Changed in: qemu (Ubuntu Plucky)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Unknown
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
  2025-10-06 17:21 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
@ 2025-10-06 18:38 ` Bug Watch Updater
  2025-10-07  9:38 ` Jonas Jelten
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Bug Watch Updater @ 2025-10-06 18:38 UTC (permalink / raw)
  To: qemu-devel

** Changed in: qemu
       Status: Unknown => New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
  2025-10-06 17:21 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
  2025-10-06 18:38 ` Bug Watch Updater
@ 2025-10-07  9:38 ` Jonas Jelten
  2025-10-08  7:03 ` Christian Ehrhardt
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Jonas Jelten @ 2025-10-07  9:38 UTC (permalink / raw)
  To: qemu-devel

** Tags added: server-todo

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (2 preceding siblings ...)
  2025-10-07  9:38 ` Jonas Jelten
@ 2025-10-08  7:03 ` Christian Ehrhardt
  2025-10-21 20:01 ` Wesley Hershberger
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Christian Ehrhardt @ 2025-10-08  7:03 UTC (permalink / raw)
  To: qemu-devel

** Tags removed: server-todo

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (3 preceding siblings ...)
  2025-10-08  7:03 ` Christian Ehrhardt
@ 2025-10-21 20:01 ` Wesley Hershberger
  2025-11-12 18:46 ` Bug Watch Updater
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2025-10-21 20:01 UTC (permalink / raw)
  To: qemu-devel

** Also affects: qemu (Ubuntu Resolute)
   Importance: Medium
     Assignee: Wesley Hershberger (whershberger)
       Status: Confirmed

** Changed in: qemu (Ubuntu Plucky)
     Assignee: (unassigned) => Wesley Hershberger (whershberger)

** Changed in: qemu (Ubuntu Noble)
     Assignee: (unassigned) => Wesley Hershberger (whershberger)

** Changed in: qemu (Ubuntu Jammy)
     Assignee: (unassigned) => Wesley Hershberger (whershberger)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed
Status in qemu source package in Resolute:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (4 preceding siblings ...)
  2025-10-21 20:01 ` Wesley Hershberger
@ 2025-11-12 18:46 ` Bug Watch Updater
  2026-01-12 12:27 ` Athos Ribeiro
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Bug Watch Updater @ 2025-11-12 18:46 UTC (permalink / raw)
  To: qemu-devel

** Changed in: qemu
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed
Status in qemu source package in Resolute:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (5 preceding siblings ...)
  2025-11-12 18:46 ` Bug Watch Updater
@ 2026-01-12 12:27 ` Athos Ribeiro
  2026-01-12 14:19 ` Wesley Hershberger
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Athos Ribeiro @ 2026-01-12 12:27 UTC (permalink / raw)
  To: qemu-devel

Since we are 3 days away from the plucky EOL, should we close that task
as wontfix?

https://lists.ubuntu.com/archives/ubuntu-security-
announce/2026-January/010065.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed
Status in qemu source package in Resolute:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (6 preceding siblings ...)
  2026-01-12 12:27 ` Athos Ribeiro
@ 2026-01-12 14:19 ` Wesley Hershberger
  2026-02-09 16:19 ` Wesley Hershberger
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-01-12 14:19 UTC (permalink / raw)
  To: qemu-devel

Thanks for the ping Athos; I won't be able to get to this before the
Plucky EOL.

The fix for this landed as 9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
upstream, present in v10.2.0-rc1 and released in v10.2.0.

** Changed in: qemu (Ubuntu Plucky)
       Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Confirmed
Status in qemu source package in Resolute:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (7 preceding siblings ...)
  2026-01-12 14:19 ` Wesley Hershberger
@ 2026-02-09 16:19 ` Wesley Hershberger
  2026-02-09 16:36 ` Launchpad Bug Tracker
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-02-09 16:19 UTC (permalink / raw)
  To: qemu-devel

** Description changed:

  [ Impact ]
  
  When running `block-stream` and `query-named-block-nodes` concurrently,
  a null-pointer dereference causes QEMU to segfault.
  
+ The original reporter of this issue experienced the bug while performing
+ concurrent libvirt `virDomainBlockPull` calls on the same VM/different
+ disks. The race condition occurs at the end of the `block-stream` QMP;
+ libvirt's handler for a completed `block-stream`
+ (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-block-
+ nodes` (see "libvirt trace" below for a full trace).
+ 
  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
- 25.10. I have not yet reproduced the bug using an upstream build.
- 
- I will link the upstream bug report here as soon as I've written it.
- 
- [ Reproducer ]
+ 25.10.
+ 
+ [1] qemuBlockJobProcessEventCompletedPull
+ 
+ [ Test Plan ]
+ 
+ ```
+ sudo apt install libvirt-daemon-system virtinst
+ ```
  
  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash
  
  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```
  
  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash
  
  set -ex
  
  domain="$1"
  
  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi
  
- ./query_named_block_nodes.sh "${domain}" &
+ ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!
  
  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"
  
      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata
  
      virsh blockpull "${domain}" vdb
  
      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done
  
  kill "${query_pid}"
  ```
  
- Provision (`Ctrl + ]` after boot):
- ```sh
- wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
+ `provision.sh` (`Ctrl + ]` after boot):
+ ```sh
+ #!/bin/bash
+ 
+ set -ex
+ 
+ wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
+ server-cloudimg-amd64.img
  
  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
  
  touch network-config
  touch meta-data
  touch user-data
  
  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```
  
  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
+ chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
+ ./provision.sh
  ./blockrebase-crash n0
  ```
  
- [ Details ]
+ Expected behavior: `blockrebase-crash.sh` runs until "No space left on
+ device"
+ 
+ Actual behavior: QEMU crashes after a few iterations:
+ ```
+ Block Pull: [81.05 %]+ bjr=
+ + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
+ ++ virsh blockjob n0 vdb
+ Block Pull: [97.87 %]+ bjr=
+ + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
+ ++ virsh blockjob n0 vdb
+ error: Unable to read from monitor: Connection reset by peer
+ error: Unable to read from monitor: Connection reset by peer
+ + bjr=
+ ++ virsh list --uuid
+ + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
+ ++ uuidgen
+ + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
+ + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
+ error: Requested operation is not valid: domain is not running
+ Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
+ + virsh blockpull n0 vdb
+ error: Requested operation is not valid: domain is not running
+ error: Requested operation is not valid: domain is not running
+ 
+ wesley@nv0:~$ error: Requested operation is not valid: domain is not running
+ ```
+ 
+ [ Where problems could occur ]
+ 
+ The only codepaths affected by this change are `block-stream` and
+ `blockdev-backup` [1][2]. If the code is somehow broken, we would expect
+ to see failures when executing these QMP commands (or the libvirt APIs
+ that use them, `virDomainBlockPull` and `virDomainBackupBegin` [3][4]).
+ 
+ As noted in the upstream commit message, the change does cause an
+ additional flush to occur during `blockdev-backup` QMPs.
+ 
+ The patch that was ultimately merged upstream was a revert of most of
+ [5]. _That_ patch was a workaround for a blockdev permissions issue that
+ was later resolved in [6] (see the end of [7] and replies for upstream
+ discussion). Both [5] and [6] are present in QEMU 6.2.0, so the
+ assumptions that led us to the upstream solution hold for Jammy.
+ 
+ [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
+ [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
+ [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
+ [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
+ [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
+ [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
+ [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
+ 
+ [ Other info ]
  
  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```
  
- The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
+ The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
+ 
+ Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
+ should not be able to observe a NULL list of children.
+ 
+ `query-named-block-nodes` iterates the global list of block nodes
+ `graph_bdrv_states` [5]. The offending block node (the `cor_filter_bs`,
+ added during a `block-stream`) was removed from the list of block nodes
+ _for the disk_ when the operation finished, but not removed from the
+ global list of block nodes until later (this is the window for the
+ race). The patch keeps the block node in the disk's list until it is
+ dropped at the end of the blockjob.
  
  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
+ [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
+ [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
+ 
+ [ libvirt trace ]
+ `qemuBlockJobProcessEventCompletedPull` [1]
+ `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
+ `qemuBlockGetNamedNodeData` [3]
+ `qemuMonitorBlockGetNamedNodeData` [4]
+ `qemuMonitorJSONBlockGetNamedNodeData` [5]
+ `qemuMonitorJSONQueryNamedBlockNodes` [6]
+ 
+ [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
+ [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
+ [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
+ [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
+ [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
+ [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

** Changed in: qemu (Ubuntu Questing)
       Status: Confirmed => In Progress

** Changed in: qemu (Ubuntu Noble)
       Status: Confirmed => In Progress

** Changed in: qemu (Ubuntu Jammy)
       Status: Confirmed => In Progress

** Changed in: qemu (Ubuntu Resolute)
     Assignee: Wesley Hershberger (whershberger) => (unassigned)

** Changed in: qemu (Ubuntu Resolute)
       Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  In Progress
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Noble:
  In Progress
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  In Progress
Status in qemu source package in Resolute:
  In Progress

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (8 preceding siblings ...)
  2026-02-09 16:19 ` Wesley Hershberger
@ 2026-02-09 16:36 ` Launchpad Bug Tracker
  2026-02-18 14:31 ` Wesley Hershberger
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Launchpad Bug Tracker @ 2026-02-09 16:36 UTC (permalink / raw)
  To: qemu-devel

** Merge proposal linked:
   https://code.launchpad.net/~whershberger/ubuntu/+source/qemu/+git/qemu/+merge/500070

** Merge proposal linked:
   https://code.launchpad.net/~whershberger/ubuntu/+source/qemu/+git/qemu/+merge/500071

** Merge proposal linked:
   https://code.launchpad.net/~whershberger/ubuntu/+source/qemu/+git/qemu/+merge/500072

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  In Progress
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Noble:
  In Progress
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  In Progress
Status in qemu source package in Resolute:
  In Progress

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (9 preceding siblings ...)
  2026-02-09 16:36 ` Launchpad Bug Tracker
@ 2026-02-18 14:31 ` Wesley Hershberger
  2026-02-27  9:56 ` Timo Aaltonen
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-02-18 14:31 UTC (permalink / raw)
  To: qemu-devel

** Changed in: qemu (Ubuntu Resolute)
       Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Noble:
  In Progress
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  In Progress
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (10 preceding siblings ...)
  2026-02-18 14:31 ` Wesley Hershberger
@ 2026-02-27  9:56 ` Timo Aaltonen
  2026-02-27 16:59 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.3) Ubuntu SRU Bot
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-02-27  9:56 UTC (permalink / raw)
  To: qemu-devel

Hello Wesley, or anyone else affected,

Accepted qemu into questing-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/qemu/1:10.1.0+ds-5ubuntu2.3 in a
few hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
questing to verification-done-questing. If it does not fix the bug for
you, please add a comment stating that, and change the tag to
verification-failed-questing. In either case, without details of your
testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: qemu (Ubuntu Questing)
       Status: In Progress => Fix Committed

** Tags added: verification-needed verification-needed-questing

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Noble:
  In Progress
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.3)
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (11 preceding siblings ...)
  2026-02-27  9:56 ` Timo Aaltonen
@ 2026-02-27 16:59 ` Ubuntu SRU Bot
  2026-03-06 10:48 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Timo Aaltonen
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Ubuntu SRU Bot @ 2026-02-27 16:59 UTC (permalink / raw)
  To: qemu-devel

All autopkgtests for the newly accepted qemu (1:10.1.0+ds-5ubuntu2.3) for questing have finished running.
The following regressions have been reported in tests triggered by the package:

architecture-properties/0.2.6 (ppc64el)
casper/25.10.2 (amd64, ppc64el)
edk2/2025.02-8ubuntu3 (amd64, armhf)
freedom-maker/0.34 (arm64)
incus/6.0.4-2 (arm64, s390x)
ironic-python-agent/10.2.0-3 (arm64)
kworkflow/20191112-1.2 (amd64)
libvirt/11.6.0-1ubuntu3.3 (arm64, ppc64el)
nova/3:32.0.0-0ubuntu1.1 (i386)
systemd/257.9-0ubuntu2.1 (armhf)


Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-
migration/questing/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Noble:
  In Progress
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (12 preceding siblings ...)
  2026-02-27 16:59 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.3) Ubuntu SRU Bot
@ 2026-03-06 10:48 ` Timo Aaltonen
  2026-03-11 17:57 ` Wesley Hershberger
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-03-06 10:48 UTC (permalink / raw)
  To: qemu-devel

the questing, noble and jammy uploads need to be rebased to the security
update that went out earlier this week

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Noble:
  In Progress
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (13 preceding siblings ...)
  2026-03-06 10:48 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Timo Aaltonen
@ 2026-03-11 17:57 ` Wesley Hershberger
  2026-03-12 14:58 ` Hector CAO
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-11 17:57 UTC (permalink / raw)
  To: qemu-devel

Sorry for the delay on this; I've rebased the MPs, pushed new tags &
uploaded test builds to my PPAs; still waiting on RISC-V but everything
else is green.

[1] https://launchpad.net/~whershberger/+archive/ubuntu/lp2126951-updates
[2] https://launchpad.net/~whershberger/+archive/ubuntu/lp2126951-proposed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Noble:
  In Progress
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (14 preceding siblings ...)
  2026-03-11 17:57 ` Wesley Hershberger
@ 2026-03-12 14:58 ` Hector CAO
  2026-03-12 15:10 ` Wesley Hershberger
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Hector CAO @ 2026-03-12 14:58 UTC (permalink / raw)
  To: qemu-devel

Hello, Wesly,

I assume this needs an re-upload from Nick or someone else ?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Noble:
  In Progress
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (15 preceding siblings ...)
  2026-03-12 14:58 ` Hector CAO
@ 2026-03-12 15:10 ` Wesley Hershberger
  2026-03-19 12:49 ` [Bug 2126951] Please test proposed package Timo Aaltonen
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-12 15:10 UTC (permalink / raw)
  To: qemu-devel

Yes that's correct. The RISC-V test build failures look spurious (no
logs); I've restarted them.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Noble:
  In Progress
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Please test proposed package
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (16 preceding siblings ...)
  2026-03-12 15:10 ` Wesley Hershberger
@ 2026-03-19 12:49 ` Timo Aaltonen
  2026-03-19 12:51 ` Timo Aaltonen
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-03-19 12:49 UTC (permalink / raw)
  To: qemu-devel

Hello Wesley, or anyone else affected,

Accepted qemu into questing-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/qemu/1:10.1.0+ds-5ubuntu2.5 in a
few hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
questing to verification-done-questing. If it does not fix the bug for
you, please add a comment stating that, and change the tag to
verification-failed-questing. In either case, without details of your
testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: qemu (Ubuntu Noble)
       Status: In Progress => Fix Committed

** Tags added: verification-needed-noble

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Please test proposed package
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (17 preceding siblings ...)
  2026-03-19 12:49 ` [Bug 2126951] Please test proposed package Timo Aaltonen
@ 2026-03-19 12:51 ` Timo Aaltonen
  2026-03-19 12:53 ` Timo Aaltonen
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-03-19 12:51 UTC (permalink / raw)
  To: qemu-devel

Hello Wesley, or anyone else affected,

Accepted qemu into noble-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/qemu/1:8.2.2+ds-0ubuntu1.14 in a
few hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
noble to verification-done-noble. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-noble. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: qemu (Ubuntu Jammy)
       Status: In Progress => Fix Committed

** Tags added: verification-needed-jammy

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Please test proposed package
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (18 preceding siblings ...)
  2026-03-19 12:51 ` Timo Aaltonen
@ 2026-03-19 12:53 ` Timo Aaltonen
  2026-03-19 14:30 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Michael Tokarev
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Timo Aaltonen @ 2026-03-19 12:53 UTC (permalink / raw)
  To: qemu-devel

Hello Wesley, or anyone else affected,

Accepted qemu into jammy-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/qemu/1:6.2+dfsg-2ubuntu6.29 in a
few hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
jammy to verification-done-jammy. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-jammy. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (19 preceding siblings ...)
  2026-03-19 12:53 ` Timo Aaltonen
@ 2026-03-19 14:30 ` Michael Tokarev
  2026-03-19 14:37 ` Wesley Hershberger
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Michael Tokarev @ 2026-03-19 14:30 UTC (permalink / raw)
  To: qemu-devel

Shouldn't this bug and fix be reflected in the upstream qemu too?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (20 preceding siblings ...)
  2026-03-19 14:30 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Michael Tokarev
@ 2026-03-19 14:37 ` Wesley Hershberger
  2026-03-19 20:18 ` Michael Tokarev
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-19 14:37 UTC (permalink / raw)
  To: qemu-devel

Hey Michael,

The fix for this landed as 9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
upstream, present in v10.2.0-rc1 and released in v10.2.0. I'll add the
upstream link to the bug description.

** Description changed:

  [ Impact ]
  
  When running `block-stream` and `query-named-block-nodes` concurrently,
  a null-pointer dereference causes QEMU to segfault.
  
  The original reporter of this issue experienced the bug while performing
  concurrent libvirt `virDomainBlockPull` calls on the same VM/different
  disks. The race condition occurs at the end of the `block-stream` QMP;
  libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-block-
  nodes` (see "libvirt trace" below for a full trace).
  
  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.
  
  [1] qemuBlockJobProcessEventCompletedPull
  
  [ Test Plan ]
  
  ```
  sudo apt install libvirt-daemon-system virtinst
  ```
  
  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash
  
  while true; do
-     virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
+     virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```
  
  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash
  
  set -ex
  
  domain="$1"
  
  if [ -z "${domain}" ]; then
-     echo "Missing domain name"
-     exit 1
+     echo "Missing domain name"
+     exit 1
  fi
  
  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!
  
  while [ -n "$(virsh list --uuid)" ]; do
-     snap="snap0-$(uuidgen)"
- 
-     virsh snapshot-create-as "${domain}" \
-         --name "${snap}" \
-         --disk-only file= \
-         --diskspec vda,snapshot=no \
-         --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
-         --atomic \
-         --no-metadata
- 
-     virsh blockpull "${domain}" vdb
- 
-     while bjr=$(virsh blockjob "$domain" vdb); do
-         if [[ "$bjr" == *"No current block job for"* ]] ; then
-             break;
-         fi;
-     done;
+     snap="snap0-$(uuidgen)"
+ 
+     virsh snapshot-create-as "${domain}" \
+         --name "${snap}" \
+         --disk-only file= \
+         --diskspec vda,snapshot=no \
+         --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
+         --atomic \
+         --no-metadata
+ 
+     virsh blockpull "${domain}" vdb
+ 
+     while bjr=$(virsh blockjob "$domain" vdb); do
+         if [[ "$bjr" == *"No current block job for"* ]] ; then
+             break;
+         fi;
+     done;
  done
  
  kill "${query_pid}"
  ```
  
  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash
  
  set -ex
  
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img
  
  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
  
  touch network-config
  touch meta-data
  touch user-data
  
  virt-install \
-   -n n0 \
-   --description "Test noble minimal" \
-   --os-variant=ubuntu24.04 \
-   --ram=1024 --vcpus=2 \
-   --import \
-   --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
-   --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
-   --graphics none \
-   --network network=default \
-   --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
+   -n n0 \
+   --description "Test noble minimal" \
+   --os-variant=ubuntu24.04 \
+   --ram=1024 --vcpus=2 \
+   --import \
+   --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
+   --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
+   --graphics none \
+   --network network=default \
+   --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```
  
  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```
  
  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"
  
  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running
  
  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```
  
  [ Where problems could occur ]
  
  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would expect
  to see failures when executing these QMP commands (or the libvirt APIs
  that use them, `virDomainBlockPull` and `virDomainBackupBegin` [3][4]).
  
  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.
  
  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue that
  was later resolved in [6] (see the end of [7] and replies for upstream
  discussion). Both [5] and [6] are present in QEMU 6.2.0, so the
  assumptions that led us to the upstream solution hold for Jammy.
  
  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html
  
  [ Other info ]
  
  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
-     at block/qapi.c:62
+     at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
-     at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
+     at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
-     errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
+     errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
-     at qapi/qapi-commands-block-core.c:553
+     at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
-     user_data=<optimized out>) at util/async.c:361
+     user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```
  
  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]
  
  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.
  
  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the `cor_filter_bs`,
  added during a `block-stream`) was removed from the list of block nodes
  _for the disk_ when the operation finished, but not removed from the
  global list of block nodes until later (this is the window for the
  race). The patch keeps the block node in the disk's list until it is
  dropped at the end of the blockjob.
  
  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
+ [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  
  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]
  
  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
  [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (21 preceding siblings ...)
  2026-03-19 14:37 ` Wesley Hershberger
@ 2026-03-19 20:18 ` Michael Tokarev
  2026-03-20  2:23 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.5) Ubuntu SRU Bot
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Michael Tokarev @ 2026-03-19 20:18 UTC (permalink / raw)
  To: qemu-devel

Ah, ok, I didn't look that far, thinking it should be a recent change.
If that's the case, this commit should be picked up for qemu 10.0.x
series.  It's interesting I missed it during 10.2 RCs.

Thanks!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
  [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.5)
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (22 preceding siblings ...)
  2026-03-19 20:18 ` Michael Tokarev
@ 2026-03-20  2:23 ` Ubuntu SRU Bot
  2026-03-20  3:11 ` [Bug 2126951] Autopkgtest regression report (qemu/1:8.2.2+ds-0ubuntu1.14) Ubuntu SRU Bot
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Ubuntu SRU Bot @ 2026-03-20  2:23 UTC (permalink / raw)
  To: qemu-devel

All autopkgtests for the newly accepted qemu (1:10.1.0+ds-5ubuntu2.5) for questing have finished running.
The following regressions have been reported in tests triggered by the package:

freedom-maker/0.34 (armhf)
multipath-tools/0.11.1-3ubuntu2 (ppc64el)
nova/unknown (ppc64el)
qemu/1:10.1.0+ds-5ubuntu2.5 (armhf)
systemd/257.9-0ubuntu2.1 (armhf)


Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-
migration/questing/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
  [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Autopkgtest regression report (qemu/1:8.2.2+ds-0ubuntu1.14)
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (23 preceding siblings ...)
  2026-03-20  2:23 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.5) Ubuntu SRU Bot
@ 2026-03-20  3:11 ` Ubuntu SRU Bot
  2026-03-20  5:12 ` [Bug 2126951] Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.29) Ubuntu SRU Bot
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Ubuntu SRU Bot @ 2026-03-20  3:11 UTC (permalink / raw)
  To: qemu-devel

All autopkgtests for the newly accepted qemu (1:8.2.2+ds-0ubuntu1.14) for noble have finished running.
The following regressions have been reported in tests triggered by the package:

cryptsetup/2:2.7.0-1ubuntu4.2 (s390x)
freedom-maker/0.33 (armhf)
glance/2:28.1.0-0ubuntu1.2 (armhf)
glib2.0/2.80.0-6ubuntu3.8 (ppc64el, s390x)
glib2.0/unknown (armhf)


Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-
migration/noble/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
  [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.29)
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (24 preceding siblings ...)
  2026-03-20  3:11 ` [Bug 2126951] Autopkgtest regression report (qemu/1:8.2.2+ds-0ubuntu1.14) Ubuntu SRU Bot
@ 2026-03-20  5:12 ` Ubuntu SRU Bot
  2026-03-26 19:56 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Ubuntu SRU Bot @ 2026-03-20  5:12 UTC (permalink / raw)
  To: qemu-devel

All autopkgtests for the newly accepted qemu (1:6.2+dfsg-2ubuntu6.29) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

livecd-rootfs/2.765.55 (amd64, ppc64el)
nova/unknown (ppc64el)
systemd/249.11-0ubuntu3.17 (armhf)
systemd/unknown (ppc64el)


Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-
migration/jammy/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Committed
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Committed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
  [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (25 preceding siblings ...)
  2026-03-20  5:12 ` [Bug 2126951] Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.29) Ubuntu SRU Bot
@ 2026-03-26 19:56 ` Wesley Hershberger
  2026-03-26 19:56 ` Wesley Hershberger
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-26 19:56 UTC (permalink / raw)
  To: qemu-devel

The crash is reproducible with the current version:

wesley@qv0:~$ apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:10.1.0+ds-5ubuntu2.4
  Candidate: 1:10.1.0+ds-5ubuntu2.4
  Version table:
 *** 1:10.1.0+ds-5ubuntu2.4 500
        500 http://archive.ubuntu.com/ubuntu questing-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu questing-security/main amd64 Packages
        100 /var/lib/dpkg/status
     1:10.1.0+ds-5ubuntu2 500
        500 http://archive.ubuntu.com/ubuntu questing/main amd64 Packages

### Verfication done Questing ###

wesley@qv0:~$ apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:10.1.0+ds-5ubuntu2.5
  Candidate: 1:10.1.0+ds-5ubuntu2.5
  Version table:
 *** 1:10.1.0+ds-5ubuntu2.5 100
        100 http://archive.ubuntu.com/ubuntu questing-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:10.1.0+ds-5ubuntu2.4 500
        500 http://archive.ubuntu.com/ubuntu questing-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu questing-security/main amd64 Packages
     1:10.1.0+ds-5ubuntu2 500
        500 http://archive.ubuntu.com/ubuntu questing/main amd64 Packages

wesley@qv0:~$ ./provision.sh
+ wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
--2026-03-26 08:56:59--  https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
Resolving cloud-images.ubuntu.com (cloud-images.ubuntu.com)... 185.125.190.40, 185.125.190.37, 2620:2d:4000:1::17, ...
Connecting to cloud-images.ubuntu.com (cloud-images.ubuntu.com)|185.125.190.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 629987328 (601M) [application/octet-stream]
Saving to: ‘noble-server-cloudimg-amd64.img.1’

noble-server-cloudimg-amd64.img.1
100%[===================================================================>]
600.80M  1.60MB/s    in 21s

2026-03-26 08:57:21 (28.4 MB/s) - ‘noble-server-cloudimg-amd64.img.1’
saved [629987328/629987328]

+ sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
+ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
Formatting '/var/lib/libvirt/images/n0-blk0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
+ touch network-config
+ touch meta-data
+ touch user-data
+ virt-install -n n0 --description 'Test noble minimal' --os-variant=ubuntu24.04 --ram=1024 --vcpus=2 --import --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 --graphics none --network network=default --cloud-init user-data=user-data,meta-data=meta-data,network-config=network-config
...

wesley@qv0:~$ virsh list
 Id   Name   State
----------------------
 2    n0     running

wesley@qv0:~$ ./blockrebase-crash.sh n0
...

My connection to the testbed died about 5 hours in to the test; it
filled up 10GB with snapshots and didn't crash:

wesley@qv0:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           391M  1.4M  389M   1% /run
efivarfs        256K   44K  208K  18% /sys/firmware/efi/efivars
/dev/sda1        19G   11G  8.3G  55% /

wesley@qv0:~$ virsh list
 Id   Name   State
----------------------
 2    n0     running

### Verfication done Questing ###

** Tags removed: verification-needed-questing
** Tags added: verification-done-questing

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Released

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
  [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (26 preceding siblings ...)
  2026-03-26 19:56 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
@ 2026-03-26 19:56 ` Wesley Hershberger
  2026-03-26 19:57 ` Wesley Hershberger
  2026-03-26 19:57 ` Wesley Hershberger
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-26 19:56 UTC (permalink / raw)
  To: qemu-devel

The crash is reproducible with the current version:

wesley@nv0:~$ apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:8.2.2+ds-0ubuntu1.13
  Candidate: 1:8.2.2+ds-0ubuntu1.13
  Version table:
 *** 1:8.2.2+ds-0ubuntu1.13 500
        500 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu noble-security/main amd64 Packages
        100 /var/lib/dpkg/status
     1:8.2.2+ds-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu noble/main amd64 Packages

### Verification done Noble ###

wesley@nv0:~$ apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:8.2.2+ds-0ubuntu1.14
  Candidate: 1:8.2.2+ds-0ubuntu1.14
  Version table:
 *** 1:8.2.2+ds-0ubuntu1.14 100
        100 http://archive.ubuntu.com/ubuntu noble-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:8.2.2+ds-0ubuntu1.13 500
        500 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu noble-security/main amd64 Packages
     1:8.2.2+ds-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu noble/main amd64 Packages

wesley@nv0:~$ ./provision.sh
+ wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
--2026-03-26 08:56:55--  https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
Resolving cloud-images.ubuntu.com (cloud-images.ubuntu.com)... 185.125.190.37, 185.125.190.40, 2620:2d:4000:1::1a, ...
Connecting to cloud-images.ubuntu.com (cloud-images.ubuntu.com)|185.125.190.37|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 629987328 (601M) [application/octet-stream]
Saving to: ‘noble-server-cloudimg-amd64.img.1’

noble-server-cloudimg-amd64.img.1
100%[====================================================================>]
600.80M  26.4MB/s    in 38s

2026-03-26 08:57:34 (16.0 MB/s) - ‘noble-server-cloudimg-amd64.img.1’
saved [629987328/629987328]

+ sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
+ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
Formatting '/var/lib/libvirt/images/n0-blk0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
+ touch network-config
+ touch meta-data
+ touch user-data
+ virt-install -n n0 --description 'Test noble minimal' --os-variant=ubuntu24.04 --ram=1024 --vcpus=2 --import --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 --graphics none --network network=default --cloud-init user-data=user-data,meta-data=meta-data,network-config=network-config
...

wesley@nv0:~$ virsh list
 Id   Name   State
----------------------
 1    n0     running

wesley@nv0:~$ ./blockrebase-crash.sh n0
...

My connection to the testbed died about 5 hours in to the test; it
filled up 8GB with snapshots and didn't crash:

wesley@nv0:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           391M  1.3M  390M   1% /run
efivarfs        256K   44K  208K  18% /sys/firmware/efi/efivars
/dev/sda1        19G  9.8G  8.6G  54% /
...

wesley@nv0:~$ virsh list
 Id   Name   State
----------------------
 1    n0     running

### Verification done Noble ###

** Tags removed: verification-needed-noble
** Tags added: verification-done-noble

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Released

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
  [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (27 preceding siblings ...)
  2026-03-26 19:56 ` Wesley Hershberger
@ 2026-03-26 19:57 ` Wesley Hershberger
  2026-03-26 19:57 ` Wesley Hershberger
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-26 19:57 UTC (permalink / raw)
  To: qemu-devel

The crash is reproducible with the current version:

wesley@jv0:~$ apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:6.2+dfsg-2ubuntu6.28
  Candidate: 1:6.2+dfsg-2ubuntu6.28
  Version table:
 *** 1:6.2+dfsg-2ubuntu6.28 500
        500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     1:6.2+dfsg-2ubuntu6 500
        500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages

### Verification done Jammy ###

wesley@jv0:~$ apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:6.2+dfsg-2ubuntu6.29
  Candidate: 1:6.2+dfsg-2ubuntu6.29
  Version table:
 *** 1:6.2+dfsg-2ubuntu6.29 500
        500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:6.2+dfsg-2ubuntu6.28 500
        500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
     1:6.2+dfsg-2ubuntu6 500
        500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages

wesley@jv0:~$ ./provision.sh
+ wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
--2026-03-26 08:56:52--  https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img
Resolving cloud-images.ubuntu.com (cloud-images.ubuntu.com)... 185.125.190.37, 185.125.190.40, 2620:2d:4000:1::1a, ...
Connecting to cloud-images.ubuntu.com (cloud-images.ubuntu.com)|185.125.190.37|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 629987328 (601M) [application/octet-stream]
Saving to: ‘noble-server-cloudimg-amd64.img.1’

noble-server-cloudimg-amd64.img.1
100%[=========================================================================>]
600.80M  22.0MB/s    in 42s

2026-03-26 08:57:35 (14.2 MB/s) - ‘noble-server-cloudimg-amd64.img.1’
saved [629987328/629987328]

+ sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
+ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G
Formatting '/var/lib/libvirt/images/n0-blk0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
+ touch network-config
+ touch meta-data
+ touch user-data
+ virt-install -n n0 --description 'Test noble minimal' --os-variant=ubuntu24.04 --ram=1024 --vcpus=2 --import --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 --graphics none --network network=default --cloud-init user-data=user-data,meta-data=meta-data,network-config=network-config
...

wesley@jv0:~$ virsh list
 Id   Name   State
----------------------
 2    n0     running

wesley@jv0:~$ ./blockrebase-crash.sh n0
...

My connection to the testbed died about 5 hours in to the test; it
filled up 8GB with snapshots and didn't crash:

wesley@jv0:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        20G  9.0G   11G  47% /

wesley@jv0:~$ virsh list
 Id   Name   State
----------------------
 2    n0     running

### Verification done Jammy ###

** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Released

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
  [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (28 preceding siblings ...)
  2026-03-26 19:57 ` Wesley Hershberger
@ 2026-03-26 19:57 ` Wesley Hershberger
  29 siblings, 0 replies; 30+ messages in thread
From: Wesley Hershberger @ 2026-03-26 19:57 UTC (permalink / raw)
  To: qemu-devel

This bug was fixed in qemu 1:10.2.1+ds-1ubuntu2 is Resolute.

** Changed in: qemu (Ubuntu Resolute)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Jammy:
  Fix Committed
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Plucky:
  Won't Fix
Status in qemu source package in Questing:
  Fix Committed
Status in qemu source package in Resolute:
  Fix Released

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  The original reporter of this issue experienced the bug while
  performing concurrent libvirt `virDomainBlockPull` calls on the same
  VM/different disks. The race condition occurs at the end of the
  `block-stream` QMP; libvirt's handler for a completed `block-stream`
  (`qemuBlockJobProcessEventCompletedPull` [1]) calls `query-named-
  block-nodes` (see "libvirt trace" below for a full trace).

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10.

  [1] qemuBlockJobProcessEventCompletedPull

  [ Test Plan ]

  ```
  sudo apt install libvirt-daemon-system virtinst
  ```

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query-named-block-nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  `provision.sh` (`Ctrl + ]` after boot):
  ```sh
  #!/bin/bash

  set -ex

  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-
  server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  chmod 755 provision.sh blockrebase-crash.sh query-named-block-nodes.sh
  ./provision.sh
  ./blockrebase-crash n0
  ```

  Expected behavior: `blockrebase-crash.sh` runs until "No space left on
  device"

  Actual behavior: QEMU crashes after a few iterations:
  ```
  Block Pull: [81.05 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  Block Pull: [97.87 %]+ bjr=
  + [[ '' == *\N\o\ \c\u\r\r\e\n\t\ \b\l\o\c\k\ \j\o\b\ \f\o\r* ]]
  ++ virsh blockjob n0 vdb
  error: Unable to read from monitor: Connection reset by peer
  error: Unable to read from monitor: Connection reset by peer
  + bjr=
  ++ virsh list --uuid
  + '[' -n 4eed8ba4-300b-4488-a520-510e5b544f57 ']'
  ++ uuidgen
  + snap=snap0-88be23e5-696c-445d-870a-abe5f7df56c0
  + virsh snapshot-create-as n0 --name snap0-88be23e5-696c-445d-870a-abe5f7df56c0 --disk-only file= --diskspec vda,snapshot=no --diskspec vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_snap0-88be23e5-696c-445d-870a-abe5f7df56c0.qcow2 --atomic --no-metadata
  error: Requested operation is not valid: domain is not running
  Domain snapshot snap0-88be23e5-696c-445d-870a-abe5f7df56c0 created
  + virsh blockpull n0 vdb
  error: Requested operation is not valid: domain is not running
  error: Requested operation is not valid: domain is not running

  wesley@nv0:~$ error: Requested operation is not valid: domain is not running
  ```

  [ Where problems could occur ]

  The only codepaths affected by this change are `block-stream` and
  `blockdev-backup` [1][2]. If the code is somehow broken, we would
  expect to see failures when executing these QMP commands (or the
  libvirt APIs that use them, `virDomainBlockPull` and
  `virDomainBackupBegin` [3][4]).

  As noted in the upstream commit message, the change does cause an
  additional flush to occur during `blockdev-backup` QMPs.

  The patch that was ultimately merged upstream was a revert of most of
  [5]. _That_ patch was a workaround for a blockdev permissions issue
  that was later resolved in [6] (see the end of [7] and replies for
  upstream discussion). Both [5] and [6] are present in QEMU 6.2.0, so
  the assumptions that led us to the upstream solution hold for Jammy.

  [1] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.block-stream
  [2] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#command-QMP-block-core.blockdev-backup
  [3] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull
  [4] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
  [5] https://gitlab.com/qemu-project/qemu/-/commit/3108a15cf09
  [6] https://gitlab.com/qemu-project/qemu/-/commit/3860c0201924d
  [7] https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg06800.html

  [ Other info ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assumed that this was one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  Ultimately the fix was neither of these [4]; `bdrv_refresh_filename`
  should not be able to observe a NULL list of children.

  `query-named-block-nodes` iterates the global list of block nodes
  `graph_bdrv_states` [5]. The offending block node (the
  `cor_filter_bs`, added during a `block-stream`) was removed from the
  list of block nodes _for the disk_ when the operation finished, but
  not removed from the global list of block nodes until later (this is
  the window for the race). The patch keeps the block node in the disk's
  list until it is dropped at the end of the blockjob.

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340
  [4] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e
  [5] https://gitlab.com/qemu-project/qemu/-/blob/v10.1.0/block.c?ref_type=tags#L72
  [6] https://gitlab.com/qemu-project/qemu/-/commit/9dbfd4e28dd11a83f54c371fade8d49a63d6dc1e

  [ libvirt trace ]
  `qemuBlockJobProcessEventCompletedPull` [1]
  `qemuBlockJobProcessEventCompletedPullBitmaps` [2]
  `qemuBlockGetNamedNodeData` [3]
  `qemuMonitorBlockGetNamedNodeData` [4]
  `qemuMonitorJSONBlockGetNamedNodeData` [5]
  `qemuMonitorJSONQueryNamedBlockNodes` [6]

  [1] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n870
  [2] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_blockjob.c?h=applied/ubuntu/questing-devel#n807
  [3] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_block.c?h=applied/ubuntu/questing-devel#n2925
  [4] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor.c?h=applied/ubuntu/questing-devel#n2039
  [5] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2816
  [6] https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/qemu/qemu_monitor_json.c?h=applied/ubuntu/questing-devel#n2159

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2026-03-26 20:07 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
2025-10-06 17:21 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
2025-10-06 18:38 ` Bug Watch Updater
2025-10-07  9:38 ` Jonas Jelten
2025-10-08  7:03 ` Christian Ehrhardt
2025-10-21 20:01 ` Wesley Hershberger
2025-11-12 18:46 ` Bug Watch Updater
2026-01-12 12:27 ` Athos Ribeiro
2026-01-12 14:19 ` Wesley Hershberger
2026-02-09 16:19 ` Wesley Hershberger
2026-02-09 16:36 ` Launchpad Bug Tracker
2026-02-18 14:31 ` Wesley Hershberger
2026-02-27  9:56 ` Timo Aaltonen
2026-02-27 16:59 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.3) Ubuntu SRU Bot
2026-03-06 10:48 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Timo Aaltonen
2026-03-11 17:57 ` Wesley Hershberger
2026-03-12 14:58 ` Hector CAO
2026-03-12 15:10 ` Wesley Hershberger
2026-03-19 12:49 ` [Bug 2126951] Please test proposed package Timo Aaltonen
2026-03-19 12:51 ` Timo Aaltonen
2026-03-19 12:53 ` Timo Aaltonen
2026-03-19 14:30 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Michael Tokarev
2026-03-19 14:37 ` Wesley Hershberger
2026-03-19 20:18 ` Michael Tokarev
2026-03-20  2:23 ` [Bug 2126951] Autopkgtest regression report (qemu/1:10.1.0+ds-5ubuntu2.5) Ubuntu SRU Bot
2026-03-20  3:11 ` [Bug 2126951] Autopkgtest regression report (qemu/1:8.2.2+ds-0ubuntu1.14) Ubuntu SRU Bot
2026-03-20  5:12 ` [Bug 2126951] Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.29) Ubuntu SRU Bot
2026-03-26 19:56 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
2026-03-26 19:56 ` Wesley Hershberger
2026-03-26 19:57 ` Wesley Hershberger
2026-03-26 19:57 ` Wesley Hershberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox