* [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() @ 2017-04-12 20:46 Jeff Cody 2017-04-12 21:38 ` John Snow 0 siblings, 1 reply; 16+ messages in thread From: Jeff Cody @ 2017-04-12 20:46 UTC (permalink / raw) To: qemu-devel; +Cc: qemu-block, jsnow, kwolf, peter.maydell, stefanha, pbonzini This occurs on v2.9.0-rc4, but not on v2.8.0. When running QEMU with an iothread, and then performing a block-mirror, if we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu becomes deadlocked. The block job is not paused, nor cancelled, so we are stuck in the while loop in block_job_detach_aio_context: static void block_job_detach_aio_context(void *opaque) { BlockJob *job = opaque; /* In case the job terminates during aio_poll()... */ block_job_ref(job); block_job_pause(job); while (!job->paused && !job->completed) { block_job_drain(job); } block_job_unref(job); } Reproducer script and QAPI commands: # QEMU script: gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 -object iothread,id=iothread0 -drive file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0 -m 1024 -boot menu=on -qmp stdio -drive file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7 # QAPI commands: { "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } } # after BLOCK_JOB_READY, do system reset { "execute": "system_reset" } gbd bt: (gdb) bt #0 0x0000555555aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x55555783e900) at block/io.c:164 #1 0x0000555555aa825d in bdrv_drained_begin (bs=bs@entry=0x55555783e900) at block/io.c:231 #2 0x0000555555aa8449 in bdrv_drain (bs=0x55555783e900) at block/io.c:265 #3 0x0000555555a9c356 in blk_drain (blk=<optimized out>) at block/block-backend.c:1383 #4 0x0000555555aa3cfd in mirror_drain (job=<optimized out>) at block/mirror.c:1000 #5 0x0000555555a66e11 in block_job_detach_aio_context (opaque=0x555557a19a40) at blockjob.c:142 #6 0x0000555555a62f4d in bdrv_detach_aio_context (bs=bs@entry=0x555557839410) at block.c:4357 #7 0x0000555555a63116 in bdrv_set_aio_context (bs=bs@entry=0x555557839410, new_context=new_context@entry=0x55555668bc20) at block.c:4418 #8 0x0000555555a9d326 in blk_set_aio_context (blk=0x5555566db520, new_context=0x55555668bc20) at block/block-backend.c:1662 #9 0x00005555557e38da in virtio_blk_data_plane_stop (vdev=<optimized out>) at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262 #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:246 #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:238 #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x555558300510) at hw/virtio/virtio-pci.c:348 #13 0x00005555559f6a18 in virtio_pci_reset (qdev=<optimized out>) at hw/virtio/virtio-pci.c:1872 #14 0x00005555559139a9 in qdev_reset_one (dev=<optimized out>, opaque=<optimized out>) at hw/core/qdev.c:310 #15 0x0000555555916738 in qbus_walk_children (bus=0x55555693aa30, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59 #16 0x0000555555913318 in qdev_walk_children (dev=0x5555569387d0, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/qdev.c:617 #17 0x0000555555916738 in qbus_walk_children (bus=0x555556756f70, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59 #18 0x00005555559168ca in qemu_devices_reset () at hw/core/reset.c:69 #19 0x000055555581fcbb in pc_machine_reset () at /home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234 #20 0x00005555558a4d96 in qemu_system_reset (report=<optimized out>) at vl.c:1697 #21 0x000055555577157a in main_loop_should_exit () at vl.c:1865 #22 0x000055555577157a in main_loop () at vl.c:1902 #23 0x000055555577157a in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4709 -Jeff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-12 20:46 [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() Jeff Cody @ 2017-04-12 21:38 ` John Snow 2017-04-12 22:22 ` Jeff Cody 0 siblings, 1 reply; 16+ messages in thread From: John Snow @ 2017-04-12 21:38 UTC (permalink / raw) To: Jeff Cody, qemu-devel Cc: kwolf, peter.maydell, qemu-block, stefanha, pbonzini On 04/12/2017 04:46 PM, Jeff Cody wrote: > > This occurs on v2.9.0-rc4, but not on v2.8.0. > > When running QEMU with an iothread, and then performing a block-mirror, if > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu > becomes deadlocked. > > The block job is not paused, nor cancelled, so we are stuck in the while > loop in block_job_detach_aio_context: > > static void block_job_detach_aio_context(void *opaque) > { > BlockJob *job = opaque; > > /* In case the job terminates during aio_poll()... */ > block_job_ref(job); > > block_job_pause(job); > > while (!job->paused && !job->completed) { > block_job_drain(job); > } > Looks like when block_job_drain calls block_job_enter from this context (the main thread, since we're trying to do a system_reset...), we cannot enter the coroutine because it's the wrong context, so we schedule an entry instead with aio_co_schedule(ctx, co); But that entry never happens, so the job never wakes up and we never make enough progress in the coroutine to gracefully pause, so we wedge here. > block_job_unref(job); > } > > > Reproducer script and QAPI commands: > > # QEMU script: > gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 -object iothread,id=iothread0 -drive file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0 -m 1024 -boot menu=on -qmp stdio -drive file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7 > > > # QAPI commands: > { "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } } > > > # after BLOCK_JOB_READY, do system reset > { "execute": "system_reset" } > > > > > > gbd bt: > > (gdb) bt > #0 0x0000555555aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x55555783e900) at block/io.c:164 > #1 0x0000555555aa825d in bdrv_drained_begin (bs=bs@entry=0x55555783e900) at block/io.c:231 > #2 0x0000555555aa8449 in bdrv_drain (bs=0x55555783e900) at block/io.c:265 > #3 0x0000555555a9c356 in blk_drain (blk=<optimized out>) at block/block-backend.c:1383 > #4 0x0000555555aa3cfd in mirror_drain (job=<optimized out>) at block/mirror.c:1000 > #5 0x0000555555a66e11 in block_job_detach_aio_context (opaque=0x555557a19a40) at blockjob.c:142 > #6 0x0000555555a62f4d in bdrv_detach_aio_context (bs=bs@entry=0x555557839410) at block.c:4357 > #7 0x0000555555a63116 in bdrv_set_aio_context (bs=bs@entry=0x555557839410, new_context=new_context@entry=0x55555668bc20) at block.c:4418 > #8 0x0000555555a9d326 in blk_set_aio_context (blk=0x5555566db520, new_context=0x55555668bc20) at block/block-backend.c:1662 > #9 0x00005555557e38da in virtio_blk_data_plane_stop (vdev=<optimized out>) at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262 > #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:246 > #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:238 > #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x555558300510) at hw/virtio/virtio-pci.c:348 > #13 0x00005555559f6a18 in virtio_pci_reset (qdev=<optimized out>) at hw/virtio/virtio-pci.c:1872 > #14 0x00005555559139a9 in qdev_reset_one (dev=<optimized out>, opaque=<optimized out>) at hw/core/qdev.c:310 > #15 0x0000555555916738 in qbus_walk_children (bus=0x55555693aa30, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59 > #16 0x0000555555913318 in qdev_walk_children (dev=0x5555569387d0, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/qdev.c:617 > #17 0x0000555555916738 in qbus_walk_children (bus=0x555556756f70, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59 > #18 0x00005555559168ca in qemu_devices_reset () at hw/core/reset.c:69 > #19 0x000055555581fcbb in pc_machine_reset () at /home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234 > #20 0x00005555558a4d96 in qemu_system_reset (report=<optimized out>) at vl.c:1697 > #21 0x000055555577157a in main_loop_should_exit () at vl.c:1865 > #22 0x000055555577157a in main_loop () at vl.c:1902 > #23 0x000055555577157a in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4709 > > > -Jeff > Here's a backtrace for an unoptimized build showing all threads: https://paste.fedoraproject.org/paste/lLnm8jKeq2wLKF6yEaoEM15M1UNdIGYhyRLivL9gydE= --js ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-12 21:38 ` John Snow @ 2017-04-12 22:22 ` Jeff Cody 2017-04-12 23:54 ` Fam Zheng 0 siblings, 1 reply; 16+ messages in thread From: Jeff Cody @ 2017-04-12 22:22 UTC (permalink / raw) To: John Snow Cc: qemu-devel, kwolf, peter.maydell, qemu-block, stefanha, pbonzini On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote: > > > On 04/12/2017 04:46 PM, Jeff Cody wrote: > > > > This occurs on v2.9.0-rc4, but not on v2.8.0. > > > > When running QEMU with an iothread, and then performing a block-mirror, if > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu > > becomes deadlocked. > > > > The block job is not paused, nor cancelled, so we are stuck in the while > > loop in block_job_detach_aio_context: > > > > static void block_job_detach_aio_context(void *opaque) > > { > > BlockJob *job = opaque; > > > > /* In case the job terminates during aio_poll()... */ > > block_job_ref(job); > > > > block_job_pause(job); > > > > while (!job->paused && !job->completed) { > > block_job_drain(job); > > } > > > > Looks like when block_job_drain calls block_job_enter from this context > (the main thread, since we're trying to do a system_reset...), we cannot > enter the coroutine because it's the wrong context, so we schedule an > entry instead with > > aio_co_schedule(ctx, co); > > But that entry never happens, so the job never wakes up and we never > make enough progress in the coroutine to gracefully pause, so we wedge here. > John Snow and I debugged this some over IRC. Here is a summary: Simply put, with iothreads the aio context is different. When block_job_detach_aio_context() is called from the main thread via the system reset (from main_loop_should_exit()), it calls block_job_drain() in a while loop, with job->busy and job->completed as exit conditions. block_job_drain() attempts to enter the coroutine (thus allowing job->busy or job->completed to change). However, since the aio context is different with iothreads, we schedule the coroutine entry rather than directly entering it. This means the job coroutine is never going to be re-entered, because we are waiting for it to complete in a while loop from the main thread, which is blocking the qemu timers which would run the scheduled coroutine... hence, we become stuck. > > block_job_unref(job); > > } > > > > > > > Reproducer script and QAPI commands: > > > > # QEMU script: > > gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 -object iothread,id=iothread0 -drive file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0 -m 1024 -boot menu=on -qmp stdio -drive file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7 > > > > > > # QAPI commands: > > { "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } } > > > > > > # after BLOCK_JOB_READY, do system reset > > { "execute": "system_reset" } > > > > > > > > > > > > gbd bt: > > > > (gdb) bt > > #0 0x0000555555aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x55555783e900) at block/io.c:164 > > #1 0x0000555555aa825d in bdrv_drained_begin (bs=bs@entry=0x55555783e900) at block/io.c:231 > > #2 0x0000555555aa8449 in bdrv_drain (bs=0x55555783e900) at block/io.c:265 > > #3 0x0000555555a9c356 in blk_drain (blk=<optimized out>) at block/block-backend.c:1383 > > #4 0x0000555555aa3cfd in mirror_drain (job=<optimized out>) at block/mirror.c:1000 > > #5 0x0000555555a66e11 in block_job_detach_aio_context (opaque=0x555557a19a40) at blockjob.c:142 > > #6 0x0000555555a62f4d in bdrv_detach_aio_context (bs=bs@entry=0x555557839410) at block.c:4357 > > #7 0x0000555555a63116 in bdrv_set_aio_context (bs=bs@entry=0x555557839410, new_context=new_context@entry=0x55555668bc20) at block.c:4418 > > #8 0x0000555555a9d326 in blk_set_aio_context (blk=0x5555566db520, new_context=0x55555668bc20) at block/block-backend.c:1662 > > #9 0x00005555557e38da in virtio_blk_data_plane_stop (vdev=<optimized out>) at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262 > > #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:246 > > #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:238 > > #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x555558300510) at hw/virtio/virtio-pci.c:348 > > #13 0x00005555559f6a18 in virtio_pci_reset (qdev=<optimized out>) at hw/virtio/virtio-pci.c:1872 > > #14 0x00005555559139a9 in qdev_reset_one (dev=<optimized out>, opaque=<optimized out>) at hw/core/qdev.c:310 > > #15 0x0000555555916738 in qbus_walk_children (bus=0x55555693aa30, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59 > > #16 0x0000555555913318 in qdev_walk_children (dev=0x5555569387d0, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/qdev.c:617 > > #17 0x0000555555916738 in qbus_walk_children (bus=0x555556756f70, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59 > > #18 0x00005555559168ca in qemu_devices_reset () at hw/core/reset.c:69 > > #19 0x000055555581fcbb in pc_machine_reset () at /home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234 > > #20 0x00005555558a4d96 in qemu_system_reset (report=<optimized out>) at vl.c:1697 > > #21 0x000055555577157a in main_loop_should_exit () at vl.c:1865 > > #22 0x000055555577157a in main_loop () at vl.c:1902 > > #23 0x000055555577157a in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4709 > > > > > > -Jeff > > > > Here's a backtrace for an unoptimized build showing all threads: > > https://paste.fedoraproject.org/paste/lLnm8jKeq2wLKF6yEaoEM15M1UNdIGYhyRLivL9gydE= > > > --js ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-12 22:22 ` Jeff Cody @ 2017-04-12 23:54 ` Fam Zheng 2017-04-13 1:11 ` Jeff Cody 2017-04-13 9:48 ` [Qemu-devel] " Peter Maydell 0 siblings, 2 replies; 16+ messages in thread From: Fam Zheng @ 2017-04-12 23:54 UTC (permalink / raw) To: Jeff Cody Cc: John Snow, kwolf, peter.maydell, qemu-block, qemu-devel, stefanha, pbonzini On Wed, 04/12 18:22, Jeff Cody wrote: > On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote: > > > > > > On 04/12/2017 04:46 PM, Jeff Cody wrote: > > > > > > This occurs on v2.9.0-rc4, but not on v2.8.0. > > > > > > When running QEMU with an iothread, and then performing a block-mirror, if > > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu > > > becomes deadlocked. > > > > > > The block job is not paused, nor cancelled, so we are stuck in the while > > > loop in block_job_detach_aio_context: > > > > > > static void block_job_detach_aio_context(void *opaque) > > > { > > > BlockJob *job = opaque; > > > > > > /* In case the job terminates during aio_poll()... */ > > > block_job_ref(job); > > > > > > block_job_pause(job); > > > > > > while (!job->paused && !job->completed) { > > > block_job_drain(job); > > > } > > > > > > > Looks like when block_job_drain calls block_job_enter from this context > > (the main thread, since we're trying to do a system_reset...), we cannot > > enter the coroutine because it's the wrong context, so we schedule an > > entry instead with > > > > aio_co_schedule(ctx, co); > > > > But that entry never happens, so the job never wakes up and we never > > make enough progress in the coroutine to gracefully pause, so we wedge here. > > > > > John Snow and I debugged this some over IRC. Here is a summary: > > Simply put, with iothreads the aio context is different. When > block_job_detach_aio_context() is called from the main thread via the system > reset (from main_loop_should_exit()), it calls block_job_drain() in a while > loop, with job->busy and job->completed as exit conditions. > > block_job_drain() attempts to enter the coroutine (thus allowing job->busy > or job->completed to change). However, since the aio context is different > with iothreads, we schedule the coroutine entry rather than directly > entering it. > > This means the job coroutine is never going to be re-entered, because we are > waiting for it to complete in a while loop from the main thread, which is > blocking the qemu timers which would run the scheduled coroutine... hence, > we become stuck. John and I confirmed that this can be fixed by this pending patch: [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html It didn't make it into 2.9-rc4 because of limited time. :( Looks like there is no -rc5, we'll have to document this as a known issue. Users should "block-job-complete/cancel" as soon as possible to avoid such a hang. Fam ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-12 23:54 ` Fam Zheng @ 2017-04-13 1:11 ` Jeff Cody 2017-04-13 1:57 ` Jeff Cody 2017-04-13 5:45 ` Paolo Bonzini 2017-04-13 9:48 ` [Qemu-devel] " Peter Maydell 1 sibling, 2 replies; 16+ messages in thread From: Jeff Cody @ 2017-04-13 1:11 UTC (permalink / raw) To: Fam Zheng Cc: John Snow, kwolf, peter.maydell, qemu-block, qemu-devel, stefanha, pbonzini On Thu, Apr 13, 2017 at 07:54:20AM +0800, Fam Zheng wrote: > On Wed, 04/12 18:22, Jeff Cody wrote: > > On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote: > > > > > > > > > On 04/12/2017 04:46 PM, Jeff Cody wrote: > > > > > > > > This occurs on v2.9.0-rc4, but not on v2.8.0. > > > > > > > > When running QEMU with an iothread, and then performing a block-mirror, if > > > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu > > > > becomes deadlocked. > > > > > > > > The block job is not paused, nor cancelled, so we are stuck in the while > > > > loop in block_job_detach_aio_context: > > > > > > > > static void block_job_detach_aio_context(void *opaque) > > > > { > > > > BlockJob *job = opaque; > > > > > > > > /* In case the job terminates during aio_poll()... */ > > > > block_job_ref(job); > > > > > > > > block_job_pause(job); > > > > > > > > while (!job->paused && !job->completed) { > > > > block_job_drain(job); > > > > } > > > > > > > > > > Looks like when block_job_drain calls block_job_enter from this context > > > (the main thread, since we're trying to do a system_reset...), we cannot > > > enter the coroutine because it's the wrong context, so we schedule an > > > entry instead with > > > > > > aio_co_schedule(ctx, co); > > > > > > But that entry never happens, so the job never wakes up and we never > > > make enough progress in the coroutine to gracefully pause, so we wedge here. > > > > > > > > > John Snow and I debugged this some over IRC. Here is a summary: > > > > Simply put, with iothreads the aio context is different. When > > block_job_detach_aio_context() is called from the main thread via the system > > reset (from main_loop_should_exit()), it calls block_job_drain() in a while > > loop, with job->busy and job->completed as exit conditions. > > > > block_job_drain() attempts to enter the coroutine (thus allowing job->busy > > or job->completed to change). However, since the aio context is different > > with iothreads, we schedule the coroutine entry rather than directly > > entering it. > > > > This means the job coroutine is never going to be re-entered, because we are > > waiting for it to complete in a while loop from the main thread, which is > > blocking the qemu timers which would run the scheduled coroutine... hence, > > we become stuck. > > John and I confirmed that this can be fixed by this pending patch: > > [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin > > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html > > It didn't make it into 2.9-rc4 because of limited time. :( > > Looks like there is no -rc5, we'll have to document this as a known issue. > Users should "block-job-complete/cancel" as soon as possible to avoid such a > hang. > I'd argue for including a fix for 2.9, since this is both a regression, and a hard lock without possible recovery short of restarting the QEMU process. -Jeff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 1:11 ` Jeff Cody @ 2017-04-13 1:57 ` Jeff Cody 2017-04-13 5:45 ` Paolo Bonzini 1 sibling, 0 replies; 16+ messages in thread From: Jeff Cody @ 2017-04-13 1:57 UTC (permalink / raw) To: Fam Zheng Cc: John Snow, kwolf, peter.maydell, qemu-block, qemu-devel, stefanha, pbonzini On Wed, Apr 12, 2017 at 09:11:09PM -0400, Jeff Cody wrote: > On Thu, Apr 13, 2017 at 07:54:20AM +0800, Fam Zheng wrote: > > On Wed, 04/12 18:22, Jeff Cody wrote: > > > On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote: > > > > > > > > > > > > On 04/12/2017 04:46 PM, Jeff Cody wrote: > > > > > > > > > > This occurs on v2.9.0-rc4, but not on v2.8.0. > > > > > > > > > > When running QEMU with an iothread, and then performing a block-mirror, if > > > > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu > > > > > becomes deadlocked. > > > > > > > > > > The block job is not paused, nor cancelled, so we are stuck in the while > > > > > loop in block_job_detach_aio_context: > > > > > > > > > > static void block_job_detach_aio_context(void *opaque) > > > > > { > > > > > BlockJob *job = opaque; > > > > > > > > > > /* In case the job terminates during aio_poll()... */ > > > > > block_job_ref(job); > > > > > > > > > > block_job_pause(job); > > > > > > > > > > while (!job->paused && !job->completed) { > > > > > block_job_drain(job); > > > > > } > > > > > > > > > > > > > Looks like when block_job_drain calls block_job_enter from this context > > > > (the main thread, since we're trying to do a system_reset...), we cannot > > > > enter the coroutine because it's the wrong context, so we schedule an > > > > entry instead with > > > > > > > > aio_co_schedule(ctx, co); > > > > > > > > But that entry never happens, so the job never wakes up and we never > > > > make enough progress in the coroutine to gracefully pause, so we wedge here. > > > > > > > > > > > > > John Snow and I debugged this some over IRC. Here is a summary: > > > > > > Simply put, with iothreads the aio context is different. When > > > block_job_detach_aio_context() is called from the main thread via the system > > > reset (from main_loop_should_exit()), it calls block_job_drain() in a while > > > loop, with job->busy and job->completed as exit conditions. > > > > > > block_job_drain() attempts to enter the coroutine (thus allowing job->busy > > > or job->completed to change). However, since the aio context is different > > > with iothreads, we schedule the coroutine entry rather than directly > > > entering it. > > > > > > This means the job coroutine is never going to be re-entered, because we are > > > waiting for it to complete in a while loop from the main thread, which is > > > blocking the qemu timers which would run the scheduled coroutine... hence, > > > we become stuck. > > > > John and I confirmed that this can be fixed by this pending patch: > > > > [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html > > > > It didn't make it into 2.9-rc4 because of limited time. :( > > > > Looks like there is no -rc5, we'll have to document this as a known issue. > > Users should "block-job-complete/cancel" as soon as possible to avoid such a > > hang. > > > > I'd argue for including a fix for 2.9, since this is both a regression, and > a hard lock without possible recovery short of restarting the QEMU process. > > -Jeff BTW, I can add my verification that the patch you referenced fixed the issue. -Jeff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 1:11 ` Jeff Cody 2017-04-13 1:57 ` Jeff Cody @ 2017-04-13 5:45 ` Paolo Bonzini 2017-04-13 14:39 ` Stefan Hajnoczi 2017-04-13 15:29 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi 1 sibling, 2 replies; 16+ messages in thread From: Paolo Bonzini @ 2017-04-13 5:45 UTC (permalink / raw) To: Jeff Cody, Fam Zheng Cc: kwolf, peter.maydell, qemu-block, qemu-devel, stefanha, John Snow On 13/04/2017 09:11, Jeff Cody wrote: >> It didn't make it into 2.9-rc4 because of limited time. :( >> >> Looks like there is no -rc5, we'll have to document this as a known issue. >> Users should "block-job-complete/cancel" as soon as possible to avoid such a >> hang. > > I'd argue for including a fix for 2.9, since this is both a regression, and > a hard lock without possible recovery short of restarting the QEMU process. It is a bit of a corner case (and jobs on I/O thread are relatively rare too), so maybe it's not worth delaying 2.9. It has been delayed already quite a bit. Another reason I think I prefer to wait is to ensure that we have an entry in qemu-iotests to avoid the future regression. Fam explained to me what happens, and the root cause is that bdrv_drain never does a release/acquire pair in this case, so the I/O thread run remains stuck in a callback that tries to acquire. Ironically reintroducing RFifoLock would probably fix this (not 100% sure). Oops. His solution is a bit hacky, but we will hopefully be able to revert it in 2.10 or whenever aio_context_acquire/release will go away. Thanks, Paolo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 5:45 ` Paolo Bonzini @ 2017-04-13 14:39 ` Stefan Hajnoczi 2017-04-13 14:45 ` Eric Blake 2017-04-13 15:02 ` Jeff Cody 2017-04-13 15:29 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi 1 sibling, 2 replies; 16+ messages in thread From: Stefan Hajnoczi @ 2017-04-13 14:39 UTC (permalink / raw) To: Paolo Bonzini Cc: Jeff Cody, Fam Zheng, kwolf, peter.maydell, qemu-block, qemu-devel, John Snow [-- Attachment #1: Type: text/plain, Size: 1232 bytes --] On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote: > > > On 13/04/2017 09:11, Jeff Cody wrote: > >> It didn't make it into 2.9-rc4 because of limited time. :( > >> > >> Looks like there is no -rc5, we'll have to document this as a known issue. > >> Users should "block-job-complete/cancel" as soon as possible to avoid such a > >> hang. > > > > I'd argue for including a fix for 2.9, since this is both a regression, and > > a hard lock without possible recovery short of restarting the QEMU process. > > It is a bit of a corner case (and jobs on I/O thread are relatively rare > too), so maybe it's not worth delaying 2.9. It has been delayed already > quite a bit. Another reason I think I prefer to wait is to ensure that > we have an entry in qemu-iotests to avoid the future regression. I also think this does not require delaying the release: 1. It needs to be marked as a known issue in the release notes. 2. Let's roll the 2.9.1 stable release within a month of 2.9.0. If both conditions are met then very few end users will be exposed to the problem. I hope libvirt will create IOThreads by default soon but for the time being it is not a widely used configuration. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 14:39 ` Stefan Hajnoczi @ 2017-04-13 14:45 ` Eric Blake 2017-04-13 14:50 ` Jeff Cody 2017-04-13 15:02 ` Jeff Cody 1 sibling, 1 reply; 16+ messages in thread From: Eric Blake @ 2017-04-13 14:45 UTC (permalink / raw) To: Stefan Hajnoczi, Paolo Bonzini Cc: kwolf, peter.maydell, Fam Zheng, qemu-block, Jeff Cody, qemu-devel, John Snow [-- Attachment #1: Type: text/plain, Size: 1629 bytes --] On 04/13/2017 09:39 AM, Stefan Hajnoczi wrote: > On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote: >> >> >> On 13/04/2017 09:11, Jeff Cody wrote: >>>> It didn't make it into 2.9-rc4 because of limited time. :( >>>> >>>> Looks like there is no -rc5, we'll have to document this as a known issue. >>>> Users should "block-job-complete/cancel" as soon as possible to avoid such a >>>> hang. >>> >>> I'd argue for including a fix for 2.9, since this is both a regression, and >>> a hard lock without possible recovery short of restarting the QEMU process. >> >> It is a bit of a corner case (and jobs on I/O thread are relatively rare >> too), so maybe it's not worth delaying 2.9. It has been delayed already >> quite a bit. Another reason I think I prefer to wait is to ensure that >> we have an entry in qemu-iotests to avoid the future regression. > > I also think this does not require delaying the release: > > 1. It needs to be marked as a known issue in the release notes. > 2. Let's roll the 2.9.1 stable release within a month of 2.9.0. > > If both conditions are met then very few end users will be exposed to > the problem. I hope libvirt will create IOThreads by default soon but > for the time being it is not a widely used configuration. Also, is it something that can be avoided by not doing a system_reset while a block job is still running? Libvirt can be taught to block reset while a job has still not been finished, if needs be. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 14:45 ` Eric Blake @ 2017-04-13 14:50 ` Jeff Cody 0 siblings, 0 replies; 16+ messages in thread From: Jeff Cody @ 2017-04-13 14:50 UTC (permalink / raw) To: Eric Blake Cc: Stefan Hajnoczi, Paolo Bonzini, kwolf, peter.maydell, Fam Zheng, qemu-block, qemu-devel, John Snow On Thu, Apr 13, 2017 at 09:45:49AM -0500, Eric Blake wrote: > On 04/13/2017 09:39 AM, Stefan Hajnoczi wrote: > > On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote: > >> > >> > >> On 13/04/2017 09:11, Jeff Cody wrote: > >>>> It didn't make it into 2.9-rc4 because of limited time. :( > >>>> > >>>> Looks like there is no -rc5, we'll have to document this as a known issue. > >>>> Users should "block-job-complete/cancel" as soon as possible to avoid such a > >>>> hang. > >>> > >>> I'd argue for including a fix for 2.9, since this is both a regression, and > >>> a hard lock without possible recovery short of restarting the QEMU process. > >> > >> It is a bit of a corner case (and jobs on I/O thread are relatively rare > >> too), so maybe it's not worth delaying 2.9. It has been delayed already > >> quite a bit. Another reason I think I prefer to wait is to ensure that > >> we have an entry in qemu-iotests to avoid the future regression. > > > > I also think this does not require delaying the release: > > > > 1. It needs to be marked as a known issue in the release notes. > > 2. Let's roll the 2.9.1 stable release within a month of 2.9.0. > > > > If both conditions are met then very few end users will be exposed to > > the problem. I hope libvirt will create IOThreads by default soon but > > for the time being it is not a widely used configuration. > > Also, is it something that can be avoided by not doing a system_reset > while a block job is still running? Libvirt can be taught to block reset > while a job has still not been finished, if needs be. > No - if the guest initiates a reboot itself, we still end up deadlocked. -Jeff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 14:39 ` Stefan Hajnoczi 2017-04-13 14:45 ` Eric Blake @ 2017-04-13 15:02 ` Jeff Cody 2017-04-13 17:03 ` John Snow 1 sibling, 1 reply; 16+ messages in thread From: Jeff Cody @ 2017-04-13 15:02 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Paolo Bonzini, Fam Zheng, kwolf, peter.maydell, qemu-block, qemu-devel, John Snow On Thu, Apr 13, 2017 at 03:39:59PM +0100, Stefan Hajnoczi wrote: > On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote: > > > > > > On 13/04/2017 09:11, Jeff Cody wrote: > > >> It didn't make it into 2.9-rc4 because of limited time. :( > > >> > > >> Looks like there is no -rc5, we'll have to document this as a known issue. > > >> Users should "block-job-complete/cancel" as soon as possible to avoid such a > > >> hang. > > > > > > I'd argue for including a fix for 2.9, since this is both a regression, and > > > a hard lock without possible recovery short of restarting the QEMU process. > > > > It is a bit of a corner case (and jobs on I/O thread are relatively rare > > too), so maybe it's not worth delaying 2.9. It has been delayed already > > quite a bit. Another reason I think I prefer to wait is to ensure that > > we have an entry in qemu-iotests to avoid the future regression. > > I also think this does not require delaying the release: > > 1. It needs to be marked as a known issue in the release notes. > 2. Let's roll the 2.9.1 stable release within a month of 2.9.0. > > If both conditions are met then very few end users will be exposed to > the problem. I hope libvirt will create IOThreads by default soon but > for the time being it is not a widely used configuration. > Without the fix, iothreads are not usable in 2.9.0, because a running block job can create a deadlock by a guest-initiated reboot. I think losing the ability to use iothreads is enough reason to warrant a fix (especially if an -rc5 may happen anyway). -Jeff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 15:02 ` Jeff Cody @ 2017-04-13 17:03 ` John Snow 0 siblings, 0 replies; 16+ messages in thread From: John Snow @ 2017-04-13 17:03 UTC (permalink / raw) To: Jeff Cody, Stefan Hajnoczi Cc: Paolo Bonzini, Fam Zheng, kwolf, peter.maydell, qemu-block, qemu-devel On 04/13/2017 11:02 AM, Jeff Cody wrote: > On Thu, Apr 13, 2017 at 03:39:59PM +0100, Stefan Hajnoczi wrote: >> On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote: >>> >>> >>> On 13/04/2017 09:11, Jeff Cody wrote: >>>>> It didn't make it into 2.9-rc4 because of limited time. :( >>>>> >>>>> Looks like there is no -rc5, we'll have to document this as a known issue. >>>>> Users should "block-job-complete/cancel" as soon as possible to avoid such a >>>>> hang. >>>> >>>> I'd argue for including a fix for 2.9, since this is both a regression, and >>>> a hard lock without possible recovery short of restarting the QEMU process. >>> >>> It is a bit of a corner case (and jobs on I/O thread are relatively rare >>> too), so maybe it's not worth delaying 2.9. It has been delayed already >>> quite a bit. Another reason I think I prefer to wait is to ensure that >>> we have an entry in qemu-iotests to avoid the future regression. >> >> I also think this does not require delaying the release: >> >> 1. It needs to be marked as a known issue in the release notes. >> 2. Let's roll the 2.9.1 stable release within a month of 2.9.0. >> >> If both conditions are met then very few end users will be exposed to >> the problem. I hope libvirt will create IOThreads by default soon but >> for the time being it is not a widely used configuration. >> > > Without the fix, iothreads are not usable in 2.9.0, because a running block > job can create a deadlock by a guest-initiated reboot. I think losing the > ability to use iothreads is enough reason to warrant a fix (especially if an > -rc5 may happen anyway). > > -Jeff > Not that it's my area of expertise, but given that Fam's "hacky" patch fixes two issues now and this is a deadlock that may indeed occur through normal usage, I'd recommend it go into an rc5 if we're spinning one anyway. +1 to Jeff's reasoning. --js ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [Qemu-block] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 5:45 ` Paolo Bonzini 2017-04-13 14:39 ` Stefan Hajnoczi @ 2017-04-13 15:29 ` Stefan Hajnoczi 1 sibling, 0 replies; 16+ messages in thread From: Stefan Hajnoczi @ 2017-04-13 15:29 UTC (permalink / raw) To: Fam Zheng Cc: Jeff Cody, Kevin Wolf, Peter Maydell, qemu block, qemu-devel, Stefan Hajnoczi, Paolo Bonzini On Thu, Apr 13, 2017 at 6:45 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 13/04/2017 09:11, Jeff Cody wrote: >>> It didn't make it into 2.9-rc4 because of limited time. :( >>> >>> Looks like there is no -rc5, we'll have to document this as a known issue. >>> Users should "block-job-complete/cancel" as soon as possible to avoid such a >>> hang. >> >> I'd argue for including a fix for 2.9, since this is both a regression, and >> a hard lock without possible recovery short of restarting the QEMU process. > > It is a bit of a corner case (and jobs on I/O thread are relatively rare > too), so maybe it's not worth delaying 2.9. It has been delayed already > quite a bit. Another reason I think I prefer to wait is to ensure that > we have an entry in qemu-iotests to avoid the future regression. > > Fam explained to me what happens, and the root cause is that bdrv_drain > never does a release/acquire pair in this case, so the I/O thread run > remains stuck in a callback that tries to acquire. Ironically > reintroducing RFifoLock would probably fix this (not 100% sure). Oops. > > His solution is a bit hacky, but we will hopefully be able to revert it > in 2.10 or whenever aio_context_acquire/release will go away. Fam, many of us will be offline Friday and Monday due to public holidays. Can you work on a patch that addresses Kevin's concerns with "[PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin"? I'll be officially offline too but am willing to review the patch. Thanks, Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-12 23:54 ` Fam Zheng 2017-04-13 1:11 ` Jeff Cody @ 2017-04-13 9:48 ` Peter Maydell 2017-04-13 14:33 ` Eric Blake 1 sibling, 1 reply; 16+ messages in thread From: Peter Maydell @ 2017-04-13 9:48 UTC (permalink / raw) To: Fam Zheng Cc: Jeff Cody, John Snow, Kevin Wolf, Qemu-block, QEMU Developers, Stefan Hajnoczi, Paolo Bonzini On 13 April 2017 at 00:54, Fam Zheng <famz@redhat.com> wrote: > John and I confirmed that this can be fixed by this pending patch: > > [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin > > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html > > It didn't make it into 2.9-rc4 because of limited time. :( > > Looks like there is no -rc5, we'll have to document this as a known issue. Well, we *hope* there is no -rc5, but if the bug is genuinely a "we can't release like this" bug we will obviously have to do another rc. Basically you all as the block maintainers should make the call about whether it's release-critical or not. thanks -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 9:48 ` [Qemu-devel] " Peter Maydell @ 2017-04-13 14:33 ` Eric Blake 2017-04-13 14:53 ` Peter Maydell 0 siblings, 1 reply; 16+ messages in thread From: Eric Blake @ 2017-04-13 14:33 UTC (permalink / raw) To: Peter Maydell, Fam Zheng Cc: Kevin Wolf, Qemu-block, Jeff Cody, QEMU Developers, Stefan Hajnoczi, Paolo Bonzini, John Snow [-- Attachment #1: Type: text/plain, Size: 1403 bytes --] On 04/13/2017 04:48 AM, Peter Maydell wrote: > On 13 April 2017 at 00:54, Fam Zheng <famz@redhat.com> wrote: >> John and I confirmed that this can be fixed by this pending patch: >> >> [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin >> >> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html >> >> It didn't make it into 2.9-rc4 because of limited time. :( >> >> Looks like there is no -rc5, we'll have to document this as a known issue. > > Well, we *hope* there is no -rc5, but if the bug is genuinely > a "we can't release like this" bug we will obviously have to > do another rc. Basically you all as the block maintainers should > make the call about whether it's release-critical or not. Just curious: is there a technical reason we couldn't spin an -rc5 today (with just the fix to this issue), and slip the schedule only by two days instead of a full week? And/or shorten the time for testing -rc5 from the usual 7 days into 5? I don't know what other constraints we have to play with, so feel free to tell me that my idea is not feasible. Also, while I'm a block layer contributor, I'm not one of its co-maintainers, so I'd trust the replies from others a bit more than mine when deciding what to do here. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() 2017-04-13 14:33 ` Eric Blake @ 2017-04-13 14:53 ` Peter Maydell 0 siblings, 0 replies; 16+ messages in thread From: Peter Maydell @ 2017-04-13 14:53 UTC (permalink / raw) To: Eric Blake Cc: Fam Zheng, Kevin Wolf, Qemu-block, Jeff Cody, QEMU Developers, Stefan Hajnoczi, Paolo Bonzini, John Snow On 13 April 2017 at 15:33, Eric Blake <eblake@redhat.com> wrote: > On 04/13/2017 04:48 AM, Peter Maydell wrote: >> Well, we *hope* there is no -rc5, but if the bug is genuinely >> a "we can't release like this" bug we will obviously have to >> do another rc. Basically you all as the block maintainers should >> make the call about whether it's release-critical or not. > > Just curious: is there a technical reason we couldn't spin an -rc5 today > (with just the fix to this issue), and slip the schedule only by two > days instead of a full week? And/or shorten the time for testing -rc5 > from the usual 7 days into 5? I've just heard about another issue which probably will require an rc5. Whether that means it makes sense to add this fix too I don't know. Not a technical reason, but we're a bit short on time to do an rc5 today, and I'm away on holiday until Tuesday after that anyway. My current plan is that we'll roll rc5 on Tuesday with either just the fix for that other issue or that plus this one, and then tag final release Thursday. What you lot as the block maintainers should decide is whether this is a release-critical bug that justifies sticking a fix in at the last minute, and if so make sure you have a patch on the list which fixes it and has been reviewed and flag it up to me so I can apply it on Tuesday. thanks -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-04-13 17:03 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-04-12 20:46 [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() Jeff Cody 2017-04-12 21:38 ` John Snow 2017-04-12 22:22 ` Jeff Cody 2017-04-12 23:54 ` Fam Zheng 2017-04-13 1:11 ` Jeff Cody 2017-04-13 1:57 ` Jeff Cody 2017-04-13 5:45 ` Paolo Bonzini 2017-04-13 14:39 ` Stefan Hajnoczi 2017-04-13 14:45 ` Eric Blake 2017-04-13 14:50 ` Jeff Cody 2017-04-13 15:02 ` Jeff Cody 2017-04-13 17:03 ` John Snow 2017-04-13 15:29 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi 2017-04-13 9:48 ` [Qemu-devel] " Peter Maydell 2017-04-13 14:33 ` Eric Blake 2017-04-13 14:53 ` Peter Maydell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).