qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
@ 2025-10-06 17:21 ` Wesley Hershberger
  2025-10-06 18:38 ` Bug Watch Updater
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Wesley Hershberger @ 2025-10-06 17:21 UTC (permalink / raw)
  To: qemu-devel

** Bug watch added: gitlab.com/qemu-project/qemu/-/issues #3149
   https://gitlab.com/qemu-project/qemu/-/issues/3149

** Also affects: qemu via
   https://gitlab.com/qemu-project/qemu/-/issues/3149
   Importance: Unknown
       Status: Unknown

** Also affects: qemu (Ubuntu Questing)
   Importance: Medium
     Assignee: Wesley Hershberger (whershberger)
       Status: Confirmed

** Also affects: qemu (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: qemu (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Also affects: qemu (Ubuntu Plucky)
   Importance: Undecided
       Status: New

** Changed in: qemu (Ubuntu Jammy)
       Status: New => Confirmed

** Changed in: qemu (Ubuntu Noble)
       Status: New => Confirmed

** Changed in: qemu (Ubuntu Plucky)
       Status: New => Confirmed

** Changed in: qemu (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: qemu (Ubuntu Noble)
   Importance: Undecided => Medium

** Changed in: qemu (Ubuntu Plucky)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Unknown
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
  2025-10-06 17:21 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
@ 2025-10-06 18:38 ` Bug Watch Updater
  2025-10-07  9:38 ` Jonas Jelten
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Bug Watch Updater @ 2025-10-06 18:38 UTC (permalink / raw)
  To: qemu-devel

** Changed in: qemu
       Status: Unknown => New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
  2025-10-06 17:21 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
  2025-10-06 18:38 ` Bug Watch Updater
@ 2025-10-07  9:38 ` Jonas Jelten
  2025-10-08  7:03 ` Christian Ehrhardt
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Jonas Jelten @ 2025-10-07  9:38 UTC (permalink / raw)
  To: qemu-devel

** Tags added: server-todo

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (2 preceding siblings ...)
  2025-10-07  9:38 ` Jonas Jelten
@ 2025-10-08  7:03 ` Christian Ehrhardt
  2025-10-21 20:01 ` Wesley Hershberger
  2025-11-12 18:46 ` Bug Watch Updater
  5 siblings, 0 replies; 6+ messages in thread
From: Christian Ehrhardt @ 2025-10-08  7:03 UTC (permalink / raw)
  To: qemu-devel

** Tags removed: server-todo

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (3 preceding siblings ...)
  2025-10-08  7:03 ` Christian Ehrhardt
@ 2025-10-21 20:01 ` Wesley Hershberger
  2025-11-12 18:46 ` Bug Watch Updater
  5 siblings, 0 replies; 6+ messages in thread
From: Wesley Hershberger @ 2025-10-21 20:01 UTC (permalink / raw)
  To: qemu-devel

** Also affects: qemu (Ubuntu Resolute)
   Importance: Medium
     Assignee: Wesley Hershberger (whershberger)
       Status: Confirmed

** Changed in: qemu (Ubuntu Plucky)
     Assignee: (unassigned) => Wesley Hershberger (whershberger)

** Changed in: qemu (Ubuntu Noble)
     Assignee: (unassigned) => Wesley Hershberger (whershberger)

** Changed in: qemu (Ubuntu Jammy)
     Assignee: (unassigned) => Wesley Hershberger (whershberger)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed
Status in qemu source package in Resolute:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes`
       [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
                   ` (4 preceding siblings ...)
  2025-10-21 20:01 ` Wesley Hershberger
@ 2025-11-12 18:46 ` Bug Watch Updater
  5 siblings, 0 replies; 6+ messages in thread
From: Bug Watch Updater @ 2025-11-12 18:46 UTC (permalink / raw)
  To: qemu-devel

** Changed in: qemu
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2126951

Title:
  `block-stream` segfault with concurrent `query-named-block-nodes`

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Jammy:
  Confirmed
Status in qemu source package in Noble:
  Confirmed
Status in qemu source package in Plucky:
  Confirmed
Status in qemu source package in Questing:
  Confirmed
Status in qemu source package in Resolute:
  Confirmed

Bug description:
  [ Impact ]

  When running `block-stream` and `query-named-block-nodes`
  concurrently, a null-pointer dereference causes QEMU to segfault.

  This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru
  25.10. I have not yet reproduced the bug using an upstream build.

  I will link the upstream bug report here as soon as I've written it.

  [ Reproducer ]

  In `query-named-block-nodes.sh`:
  ```sh
  #!/bin/bash

  while true; do
      virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
  done
  ```

  In `blockrebase-crash.sh`:
  ```sh
  #!/bin/bash

  set -ex

  domain="$1"

  if [ -z "${domain}" ]; then
      echo "Missing domain name"
      exit 1
  fi

  ./query_named_block_nodes.sh "${domain}" &
  query_pid=$!

  while [ -n "$(virsh list --uuid)" ]; do
      snap="snap0-$(uuidgen)"

      virsh snapshot-create-as "${domain}" \
          --name "${snap}" \
          --disk-only file= \
          --diskspec vda,snapshot=no \
          --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
          --atomic \
          --no-metadata

      virsh blockpull "${domain}" vdb

      while bjr=$(virsh blockjob "$domain" vdb); do
          if [[ "$bjr" == *"No current block job for"* ]] ; then
              break;
          fi;
      done;
  done

  kill "${query_pid}"
  ```

  Provision (`Ctrl + ]` after boot):
  ```sh
  wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

  sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
  sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

  touch network-config
  touch meta-data
  touch user-data

  virt-install \
    -n n0 \
    --description "Test noble minimal" \
    --os-variant=ubuntu24.04 \
    --ram=1024 --vcpus=2 \
    --import \
    --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
    --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
    --graphics none \
    --network network=default \
    --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"
  ```

  And run the script to cause the crash (you may need to manually kill
  query-named-block-jobs.sh):
  ```sh
  ./blockrebase-crash n0
  ```

  [ Details ]

  Backtrace from the coredump (source at [1]):
  ```
  #0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
  #1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
      at block/qapi.c:62
  #2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
      at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
  #3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
      errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
  #4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
      at qapi/qapi-commands-block-core.c:553
  #5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
  #6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
  #7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
  #8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
      user_data=<optimized out>) at util/async.c:361
  #9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
  #12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
  #13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
  #14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
  #15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
  #16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93
  ```

  The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:
  - `stream_clean` is freeing/modifying the `cor_filter_bs` without holding a lock that it needs to [2][3]
  - `bdrv_refresh_filename` needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

  [1] https://git.launchpad.net/ubuntu/+source/qemu/tree/block.c?h=ubuntu/questing-devel#n8071
  [2] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n131
  [3] https://git.launchpad.net/ubuntu/+source/qemu/tree/block/stream.c?h=ubuntu/questing-devel#n340

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2126951/+subscriptions



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-11-12 19:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <175977079933.1446079.11908449148472830395.malonedeb@juju-98d295-prod-launchpad-3>
2025-10-06 17:21 ` [Bug 2126951] Re: `block-stream` segfault with concurrent `query-named-block-nodes` Wesley Hershberger
2025-10-06 18:38 ` Bug Watch Updater
2025-10-07  9:38 ` Jonas Jelten
2025-10-08  7:03 ` Christian Ehrhardt
2025-10-21 20:01 ` Wesley Hershberger
2025-11-12 18:46 ` Bug Watch Updater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).