[Qemu-devel] Regression from 2.8: stuck in bdrv

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
@ 2017-04-12 20:46 Jeff Cody
  2017-04-12 21:38 ` John Snow
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2017-04-12 20:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, jsnow, kwolf, peter.maydell, stefanha, pbonzini


This occurs on v2.9.0-rc4, but not on v2.8.0.

When running QEMU with an iothread, and then performing a block-mirror, if
we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
becomes deadlocked.

The block job is not paused, nor cancelled, so we are stuck in the while
loop in block_job_detach_aio_context:

static void block_job_detach_aio_context(void *opaque)
{
    BlockJob *job = opaque;

    /* In case the job terminates during aio_poll()... */
    block_job_ref(job);

    block_job_pause(job);

    while (!job->paused && !job->completed) {
        block_job_drain(job);
    }

    block_job_unref(job);
}


Reproducer script and QAPI commands:

# QEMU script:
gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 -object iothread,id=iothread0 -drive file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap  -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0 -m 1024 -boot menu=on -qmp stdio -drive file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7 


# QAPI commands:
{ "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } }


# after BLOCK_JOB_READY, do system reset
{ "execute": "system_reset" }





gbd bt:

(gdb) bt
#0  0x0000555555aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x55555783e900) at block/io.c:164
#1  0x0000555555aa825d in bdrv_drained_begin (bs=bs@entry=0x55555783e900) at block/io.c:231
#2  0x0000555555aa8449 in bdrv_drain (bs=0x55555783e900) at block/io.c:265
#3  0x0000555555a9c356 in blk_drain (blk=<optimized out>) at block/block-backend.c:1383
#4  0x0000555555aa3cfd in mirror_drain (job=<optimized out>) at block/mirror.c:1000
#5  0x0000555555a66e11 in block_job_detach_aio_context (opaque=0x555557a19a40) at blockjob.c:142
#6  0x0000555555a62f4d in bdrv_detach_aio_context (bs=bs@entry=0x555557839410) at block.c:4357
#7  0x0000555555a63116 in bdrv_set_aio_context (bs=bs@entry=0x555557839410, new_context=new_context@entry=0x55555668bc20) at block.c:4418
#8  0x0000555555a9d326 in blk_set_aio_context (blk=0x5555566db520, new_context=0x55555668bc20) at block/block-backend.c:1662
#9  0x00005555557e38da in virtio_blk_data_plane_stop (vdev=<optimized out>) at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262
#10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:246
#11 0x00005555559fa49b in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:238
#12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x555558300510) at hw/virtio/virtio-pci.c:348
#13 0x00005555559f6a18 in virtio_pci_reset (qdev=<optimized out>) at hw/virtio/virtio-pci.c:1872
#14 0x00005555559139a9 in qdev_reset_one (dev=<optimized out>, opaque=<optimized out>) at hw/core/qdev.c:310
#15 0x0000555555916738 in qbus_walk_children (bus=0x55555693aa30, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59
#16 0x0000555555913318 in qdev_walk_children (dev=0x5555569387d0, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/qdev.c:617
#17 0x0000555555916738 in qbus_walk_children (bus=0x555556756f70, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59
#18 0x00005555559168ca in qemu_devices_reset () at hw/core/reset.c:69
#19 0x000055555581fcbb in pc_machine_reset () at /home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234
#20 0x00005555558a4d96 in qemu_system_reset (report=<optimized out>) at vl.c:1697
#21 0x000055555577157a in main_loop_should_exit () at vl.c:1865
#22 0x000055555577157a in main_loop () at vl.c:1902
#23 0x000055555577157a in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4709


-Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-12 20:46 [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() Jeff Cody
@ 2017-04-12 21:38 ` John Snow
  2017-04-12 22:22   ` Jeff Cody
  0 siblings, 1 reply; 16+ messages in thread
From: John Snow @ 2017-04-12 21:38 UTC (permalink / raw)
  To: Jeff Cody, qemu-devel
  Cc: kwolf, peter.maydell, qemu-block, stefanha, pbonzini



On 04/12/2017 04:46 PM, Jeff Cody wrote:
> 
> This occurs on v2.9.0-rc4, but not on v2.8.0.
> 
> When running QEMU with an iothread, and then performing a block-mirror, if
> we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> becomes deadlocked.
> 
> The block job is not paused, nor cancelled, so we are stuck in the while
> loop in block_job_detach_aio_context:
> 
> static void block_job_detach_aio_context(void *opaque)
> {
>     BlockJob *job = opaque;
> 
>     /* In case the job terminates during aio_poll()... */
>     block_job_ref(job);
> 
>     block_job_pause(job);
> 
>     while (!job->paused && !job->completed) {
>         block_job_drain(job);
>     }
> 

Looks like when block_job_drain calls block_job_enter from this context
(the main thread, since we're trying to do a system_reset...), we cannot
enter the coroutine because it's the wrong context, so we schedule an
entry instead with

aio_co_schedule(ctx, co);

But that entry never happens, so the job never wakes up and we never
make enough progress in the coroutine to gracefully pause, so we wedge here.

>     block_job_unref(job);
> }
> 

> 
> Reproducer script and QAPI commands:
> 
> # QEMU script:
> gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 -object iothread,id=iothread0 -drive file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap  -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0 -m 1024 -boot menu=on -qmp stdio -drive file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7 
> 
> 
> # QAPI commands:
> { "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } }
> 
> 
> # after BLOCK_JOB_READY, do system reset
> { "execute": "system_reset" }
> 
> 
> 
> 
> 
> gbd bt:
> 
> (gdb) bt
> #0  0x0000555555aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x55555783e900) at block/io.c:164
> #1  0x0000555555aa825d in bdrv_drained_begin (bs=bs@entry=0x55555783e900) at block/io.c:231
> #2  0x0000555555aa8449 in bdrv_drain (bs=0x55555783e900) at block/io.c:265
> #3  0x0000555555a9c356 in blk_drain (blk=<optimized out>) at block/block-backend.c:1383
> #4  0x0000555555aa3cfd in mirror_drain (job=<optimized out>) at block/mirror.c:1000
> #5  0x0000555555a66e11 in block_job_detach_aio_context (opaque=0x555557a19a40) at blockjob.c:142
> #6  0x0000555555a62f4d in bdrv_detach_aio_context (bs=bs@entry=0x555557839410) at block.c:4357
> #7  0x0000555555a63116 in bdrv_set_aio_context (bs=bs@entry=0x555557839410, new_context=new_context@entry=0x55555668bc20) at block.c:4418
> #8  0x0000555555a9d326 in blk_set_aio_context (blk=0x5555566db520, new_context=0x55555668bc20) at block/block-backend.c:1662
> #9  0x00005555557e38da in virtio_blk_data_plane_stop (vdev=<optimized out>) at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262
> #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:246
> #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:238
> #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x555558300510) at hw/virtio/virtio-pci.c:348
> #13 0x00005555559f6a18 in virtio_pci_reset (qdev=<optimized out>) at hw/virtio/virtio-pci.c:1872
> #14 0x00005555559139a9 in qdev_reset_one (dev=<optimized out>, opaque=<optimized out>) at hw/core/qdev.c:310
> #15 0x0000555555916738 in qbus_walk_children (bus=0x55555693aa30, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59
> #16 0x0000555555913318 in qdev_walk_children (dev=0x5555569387d0, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/qdev.c:617
> #17 0x0000555555916738 in qbus_walk_children (bus=0x555556756f70, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59
> #18 0x00005555559168ca in qemu_devices_reset () at hw/core/reset.c:69
> #19 0x000055555581fcbb in pc_machine_reset () at /home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234
> #20 0x00005555558a4d96 in qemu_system_reset (report=<optimized out>) at vl.c:1697
> #21 0x000055555577157a in main_loop_should_exit () at vl.c:1865
> #22 0x000055555577157a in main_loop () at vl.c:1902
> #23 0x000055555577157a in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4709
> 
> 
> -Jeff
> 

Here's a backtrace for an unoptimized build showing all threads:

https://paste.fedoraproject.org/paste/lLnm8jKeq2wLKF6yEaoEM15M1UNdIGYhyRLivL9gydE=


--js

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-12 21:38 ` John Snow
@ 2017-04-12 22:22   ` Jeff Cody
  2017-04-12 23:54     ` Fam Zheng
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2017-04-12 22:22 UTC (permalink / raw)
  To: John Snow
  Cc: qemu-devel, kwolf, peter.maydell, qemu-block, stefanha, pbonzini

On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote:
> 
> 
> On 04/12/2017 04:46 PM, Jeff Cody wrote:
> > 
> > This occurs on v2.9.0-rc4, but not on v2.8.0.
> > 
> > When running QEMU with an iothread, and then performing a block-mirror, if
> > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> > becomes deadlocked.
> > 
> > The block job is not paused, nor cancelled, so we are stuck in the while
> > loop in block_job_detach_aio_context:
> > 
> > static void block_job_detach_aio_context(void *opaque)
> > {
> >     BlockJob *job = opaque;
> > 
> >     /* In case the job terminates during aio_poll()... */
> >     block_job_ref(job);
> > 
> >     block_job_pause(job);
> > 
> >     while (!job->paused && !job->completed) {
> >         block_job_drain(job);
> >     }
> > 
> 
> Looks like when block_job_drain calls block_job_enter from this context
> (the main thread, since we're trying to do a system_reset...), we cannot
> enter the coroutine because it's the wrong context, so we schedule an
> entry instead with
> 
> aio_co_schedule(ctx, co);
> 
> But that entry never happens, so the job never wakes up and we never
> make enough progress in the coroutine to gracefully pause, so we wedge here.
> 


John Snow and I debugged this some over IRC.  Here is a summary:

Simply put, with iothreads the aio context is different.  When
block_job_detach_aio_context() is called from the main thread via the system
reset (from main_loop_should_exit()), it calls block_job_drain() in a while
loop, with job->busy and job->completed as exit conditions.

block_job_drain() attempts to enter the coroutine (thus allowing job->busy
or job->completed to change).  However, since the aio context is different
with iothreads, we schedule the coroutine entry rather than directly
entering it.

This means the job coroutine is never going to be re-entered, because we are
waiting for it to complete in a while loop from the main thread, which is
blocking the qemu timers which would run the scheduled coroutine... hence,
we become stuck.



> >     block_job_unref(job);
> > }
> > 
> 
> > 
> > Reproducer script and QAPI commands:
> > 
> > # QEMU script:
> > gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 -object iothread,id=iothread0 -drive file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap  -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0 -m 1024 -boot menu=on -qmp stdio -drive file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7 
> > 
> > 
> > # QAPI commands:
> > { "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } }
> > 
> > 
> > # after BLOCK_JOB_READY, do system reset
> > { "execute": "system_reset" }
> > 
> > 
> > 
> > 
> > 
> > gbd bt:
> > 
> > (gdb) bt
> > #0  0x0000555555aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x55555783e900) at block/io.c:164
> > #1  0x0000555555aa825d in bdrv_drained_begin (bs=bs@entry=0x55555783e900) at block/io.c:231
> > #2  0x0000555555aa8449 in bdrv_drain (bs=0x55555783e900) at block/io.c:265
> > #3  0x0000555555a9c356 in blk_drain (blk=<optimized out>) at block/block-backend.c:1383
> > #4  0x0000555555aa3cfd in mirror_drain (job=<optimized out>) at block/mirror.c:1000
> > #5  0x0000555555a66e11 in block_job_detach_aio_context (opaque=0x555557a19a40) at blockjob.c:142
> > #6  0x0000555555a62f4d in bdrv_detach_aio_context (bs=bs@entry=0x555557839410) at block.c:4357
> > #7  0x0000555555a63116 in bdrv_set_aio_context (bs=bs@entry=0x555557839410, new_context=new_context@entry=0x55555668bc20) at block.c:4418
> > #8  0x0000555555a9d326 in blk_set_aio_context (blk=0x5555566db520, new_context=0x55555668bc20) at block/block-backend.c:1662
> > #9  0x00005555557e38da in virtio_blk_data_plane_stop (vdev=<optimized out>) at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262
> > #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:246
> > #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:238
> > #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x555558300510) at hw/virtio/virtio-pci.c:348
> > #13 0x00005555559f6a18 in virtio_pci_reset (qdev=<optimized out>) at hw/virtio/virtio-pci.c:1872
> > #14 0x00005555559139a9 in qdev_reset_one (dev=<optimized out>, opaque=<optimized out>) at hw/core/qdev.c:310
> > #15 0x0000555555916738 in qbus_walk_children (bus=0x55555693aa30, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59
> > #16 0x0000555555913318 in qdev_walk_children (dev=0x5555569387d0, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/qdev.c:617
> > #17 0x0000555555916738 in qbus_walk_children (bus=0x555556756f70, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59
> > #18 0x00005555559168ca in qemu_devices_reset () at hw/core/reset.c:69
> > #19 0x000055555581fcbb in pc_machine_reset () at /home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234
> > #20 0x00005555558a4d96 in qemu_system_reset (report=<optimized out>) at vl.c:1697
> > #21 0x000055555577157a in main_loop_should_exit () at vl.c:1865
> > #22 0x000055555577157a in main_loop () at vl.c:1902
> > #23 0x000055555577157a in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4709
> > 
> > 
> > -Jeff
> > 
> 
> Here's a backtrace for an unoptimized build showing all threads:
> 
> https://paste.fedoraproject.org/paste/lLnm8jKeq2wLKF6yEaoEM15M1UNdIGYhyRLivL9gydE=
> 
> 
> --js

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-12 22:22   ` Jeff Cody
@ 2017-04-12 23:54     ` Fam Zheng
  2017-04-13  1:11       ` Jeff Cody
  2017-04-13  9:48       ` [Qemu-devel] " Peter Maydell
  0 siblings, 2 replies; 16+ messages in thread
From: Fam Zheng @ 2017-04-12 23:54 UTC (permalink / raw)
  To: Jeff Cody
  Cc: John Snow, kwolf, peter.maydell, qemu-block, qemu-devel, stefanha,
	pbonzini

On Wed, 04/12 18:22, Jeff Cody wrote:
> On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote:
> > 
> > 
> > On 04/12/2017 04:46 PM, Jeff Cody wrote:
> > > 
> > > This occurs on v2.9.0-rc4, but not on v2.8.0.
> > > 
> > > When running QEMU with an iothread, and then performing a block-mirror, if
> > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> > > becomes deadlocked.
> > > 
> > > The block job is not paused, nor cancelled, so we are stuck in the while
> > > loop in block_job_detach_aio_context:
> > > 
> > > static void block_job_detach_aio_context(void *opaque)
> > > {
> > >     BlockJob *job = opaque;
> > > 
> > >     /* In case the job terminates during aio_poll()... */
> > >     block_job_ref(job);
> > > 
> > >     block_job_pause(job);
> > > 
> > >     while (!job->paused && !job->completed) {
> > >         block_job_drain(job);
> > >     }
> > > 
> > 
> > Looks like when block_job_drain calls block_job_enter from this context
> > (the main thread, since we're trying to do a system_reset...), we cannot
> > enter the coroutine because it's the wrong context, so we schedule an
> > entry instead with
> > 
> > aio_co_schedule(ctx, co);
> > 
> > But that entry never happens, so the job never wakes up and we never
> > make enough progress in the coroutine to gracefully pause, so we wedge here.
> > 
> 
> 
> John Snow and I debugged this some over IRC.  Here is a summary:
> 
> Simply put, with iothreads the aio context is different.  When
> block_job_detach_aio_context() is called from the main thread via the system
> reset (from main_loop_should_exit()), it calls block_job_drain() in a while
> loop, with job->busy and job->completed as exit conditions.
> 
> block_job_drain() attempts to enter the coroutine (thus allowing job->busy
> or job->completed to change).  However, since the aio context is different
> with iothreads, we schedule the coroutine entry rather than directly
> entering it.
> 
> This means the job coroutine is never going to be re-entered, because we are
> waiting for it to complete in a while loop from the main thread, which is
> blocking the qemu timers which would run the scheduled coroutine... hence,
> we become stuck.

John and I confirmed that this can be fixed by this pending patch:

[PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin

https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html

It didn't make it into 2.9-rc4 because of limited time. :(

Looks like there is no -rc5, we'll have to document this as a known issue.
Users should "block-job-complete/cancel" as soon as possible to avoid such a
hang.

Fam

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-12 23:54     ` Fam Zheng
@ 2017-04-13  1:11       ` Jeff Cody
  2017-04-13  1:57         ` Jeff Cody
  2017-04-13  5:45         ` Paolo Bonzini
  2017-04-13  9:48       ` [Qemu-devel] " Peter Maydell
  1 sibling, 2 replies; 16+ messages in thread
From: Jeff Cody @ 2017-04-13  1:11 UTC (permalink / raw)
  To: Fam Zheng
  Cc: John Snow, kwolf, peter.maydell, qemu-block, qemu-devel, stefanha,
	pbonzini

On Thu, Apr 13, 2017 at 07:54:20AM +0800, Fam Zheng wrote:
> On Wed, 04/12 18:22, Jeff Cody wrote:
> > On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote:
> > > 
> > > 
> > > On 04/12/2017 04:46 PM, Jeff Cody wrote:
> > > > 
> > > > This occurs on v2.9.0-rc4, but not on v2.8.0.
> > > > 
> > > > When running QEMU with an iothread, and then performing a block-mirror, if
> > > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> > > > becomes deadlocked.
> > > > 
> > > > The block job is not paused, nor cancelled, so we are stuck in the while
> > > > loop in block_job_detach_aio_context:
> > > > 
> > > > static void block_job_detach_aio_context(void *opaque)
> > > > {
> > > >     BlockJob *job = opaque;
> > > > 
> > > >     /* In case the job terminates during aio_poll()... */
> > > >     block_job_ref(job);
> > > > 
> > > >     block_job_pause(job);
> > > > 
> > > >     while (!job->paused && !job->completed) {
> > > >         block_job_drain(job);
> > > >     }
> > > > 
> > > 
> > > Looks like when block_job_drain calls block_job_enter from this context
> > > (the main thread, since we're trying to do a system_reset...), we cannot
> > > enter the coroutine because it's the wrong context, so we schedule an
> > > entry instead with
> > > 
> > > aio_co_schedule(ctx, co);
> > > 
> > > But that entry never happens, so the job never wakes up and we never
> > > make enough progress in the coroutine to gracefully pause, so we wedge here.
> > > 
> > 
> > 
> > John Snow and I debugged this some over IRC.  Here is a summary:
> > 
> > Simply put, with iothreads the aio context is different.  When
> > block_job_detach_aio_context() is called from the main thread via the system
> > reset (from main_loop_should_exit()), it calls block_job_drain() in a while
> > loop, with job->busy and job->completed as exit conditions.
> > 
> > block_job_drain() attempts to enter the coroutine (thus allowing job->busy
> > or job->completed to change).  However, since the aio context is different
> > with iothreads, we schedule the coroutine entry rather than directly
> > entering it.
> > 
> > This means the job coroutine is never going to be re-entered, because we are
> > waiting for it to complete in a while loop from the main thread, which is
> > blocking the qemu timers which would run the scheduled coroutine... hence,
> > we become stuck.
> 
> John and I confirmed that this can be fixed by this pending patch:
> 
> [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html
> 
> It didn't make it into 2.9-rc4 because of limited time. :(
> 
> Looks like there is no -rc5, we'll have to document this as a known issue.
> Users should "block-job-complete/cancel" as soon as possible to avoid such a
> hang.
>

I'd argue for including a fix for 2.9, since this is both a regression, and
a hard lock without possible recovery short of restarting the QEMU process.

-Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13  1:11       ` Jeff Cody
@ 2017-04-13  1:57         ` Jeff Cody
  2017-04-13  5:45         ` Paolo Bonzini
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Cody @ 2017-04-13  1:57 UTC (permalink / raw)
  To: Fam Zheng
  Cc: John Snow, kwolf, peter.maydell, qemu-block, qemu-devel, stefanha,
	pbonzini

On Wed, Apr 12, 2017 at 09:11:09PM -0400, Jeff Cody wrote:
> On Thu, Apr 13, 2017 at 07:54:20AM +0800, Fam Zheng wrote:
> > On Wed, 04/12 18:22, Jeff Cody wrote:
> > > On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote:
> > > > 
> > > > 
> > > > On 04/12/2017 04:46 PM, Jeff Cody wrote:
> > > > > 
> > > > > This occurs on v2.9.0-rc4, but not on v2.8.0.
> > > > > 
> > > > > When running QEMU with an iothread, and then performing a block-mirror, if
> > > > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> > > > > becomes deadlocked.
> > > > > 
> > > > > The block job is not paused, nor cancelled, so we are stuck in the while
> > > > > loop in block_job_detach_aio_context:
> > > > > 
> > > > > static void block_job_detach_aio_context(void *opaque)
> > > > > {
> > > > >     BlockJob *job = opaque;
> > > > > 
> > > > >     /* In case the job terminates during aio_poll()... */
> > > > >     block_job_ref(job);
> > > > > 
> > > > >     block_job_pause(job);
> > > > > 
> > > > >     while (!job->paused && !job->completed) {
> > > > >         block_job_drain(job);
> > > > >     }
> > > > > 
> > > > 
> > > > Looks like when block_job_drain calls block_job_enter from this context
> > > > (the main thread, since we're trying to do a system_reset...), we cannot
> > > > enter the coroutine because it's the wrong context, so we schedule an
> > > > entry instead with
> > > > 
> > > > aio_co_schedule(ctx, co);
> > > > 
> > > > But that entry never happens, so the job never wakes up and we never
> > > > make enough progress in the coroutine to gracefully pause, so we wedge here.
> > > > 
> > > 
> > > 
> > > John Snow and I debugged this some over IRC.  Here is a summary:
> > > 
> > > Simply put, with iothreads the aio context is different.  When
> > > block_job_detach_aio_context() is called from the main thread via the system
> > > reset (from main_loop_should_exit()), it calls block_job_drain() in a while
> > > loop, with job->busy and job->completed as exit conditions.
> > > 
> > > block_job_drain() attempts to enter the coroutine (thus allowing job->busy
> > > or job->completed to change).  However, since the aio context is different
> > > with iothreads, we schedule the coroutine entry rather than directly
> > > entering it.
> > > 
> > > This means the job coroutine is never going to be re-entered, because we are
> > > waiting for it to complete in a while loop from the main thread, which is
> > > blocking the qemu timers which would run the scheduled coroutine... hence,
> > > we become stuck.
> > 
> > John and I confirmed that this can be fixed by this pending patch:
> > 
> > [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html
> > 
> > It didn't make it into 2.9-rc4 because of limited time. :(
> > 
> > Looks like there is no -rc5, we'll have to document this as a known issue.
> > Users should "block-job-complete/cancel" as soon as possible to avoid such a
> > hang.
> >
> 
> I'd argue for including a fix for 2.9, since this is both a regression, and
> a hard lock without possible recovery short of restarting the QEMU process.
> 
> -Jeff

BTW, I can add my verification that the patch you referenced fixed the
issue.

-Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13  1:11       ` Jeff Cody
  2017-04-13  1:57         ` Jeff Cody
@ 2017-04-13  5:45         ` Paolo Bonzini
  2017-04-13 14:39           ` Stefan Hajnoczi
  2017-04-13 15:29           ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 2 replies; 16+ messages in thread
From: Paolo Bonzini @ 2017-04-13  5:45 UTC (permalink / raw)
  To: Jeff Cody, Fam Zheng
  Cc: kwolf, peter.maydell, qemu-block, qemu-devel, stefanha, John Snow

On 13/04/2017 09:11, Jeff Cody wrote:
>> It didn't make it into 2.9-rc4 because of limited time. :(
>>
>> Looks like there is no -rc5, we'll have to document this as a known issue.
>> Users should "block-job-complete/cancel" as soon as possible to avoid such a
>> hang.
>
> I'd argue for including a fix for 2.9, since this is both a regression, and
> a hard lock without possible recovery short of restarting the QEMU process.

It is a bit of a corner case (and jobs on I/O thread are relatively rare
too), so maybe it's not worth delaying 2.9.  It has been delayed already
quite a bit.  Another reason I think I prefer to wait is to ensure that
we have an entry in qemu-iotests to avoid the future regression.

Fam explained to me what happens, and the root cause is that bdrv_drain
never does a release/acquire pair in this case, so the I/O thread run
remains stuck in a callback that tries to acquire.  Ironically
reintroducing RFifoLock would probably fix this (not 100% sure).  Oops.

His solution is a bit hacky, but we will hopefully be able to revert it
in 2.10 or whenever aio_context_acquire/release will go away.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-12 23:54     ` Fam Zheng
  2017-04-13  1:11       ` Jeff Cody
@ 2017-04-13  9:48       ` Peter Maydell
  2017-04-13 14:33         ` Eric Blake
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2017-04-13  9:48 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Jeff Cody, John Snow, Kevin Wolf, Qemu-block, QEMU Developers,
	Stefan Hajnoczi, Paolo Bonzini

On 13 April 2017 at 00:54, Fam Zheng <famz@redhat.com> wrote:
> John and I confirmed that this can be fixed by this pending patch:
>
> [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
>
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html
>
> It didn't make it into 2.9-rc4 because of limited time. :(
>
> Looks like there is no -rc5, we'll have to document this as a known issue.

Well, we *hope* there is no -rc5, but if the bug is genuinely
a "we can't release like this" bug we will obviously have to
do another rc. Basically you all as the block maintainers should
make the call about whether it's release-critical or not.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13  9:48       ` [Qemu-devel] " Peter Maydell
@ 2017-04-13 14:33         ` Eric Blake
  2017-04-13 14:53           ` Peter Maydell
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Blake @ 2017-04-13 14:33 UTC (permalink / raw)
  To: Peter Maydell, Fam Zheng
  Cc: Kevin Wolf, Qemu-block, Jeff Cody, QEMU Developers,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 1403 bytes --]

On 04/13/2017 04:48 AM, Peter Maydell wrote:
> On 13 April 2017 at 00:54, Fam Zheng <famz@redhat.com> wrote:
>> John and I confirmed that this can be fixed by this pending patch:
>>
>> [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html
>>
>> It didn't make it into 2.9-rc4 because of limited time. :(
>>
>> Looks like there is no -rc5, we'll have to document this as a known issue.
> 
> Well, we *hope* there is no -rc5, but if the bug is genuinely
> a "we can't release like this" bug we will obviously have to
> do another rc. Basically you all as the block maintainers should
> make the call about whether it's release-critical or not.

Just curious: is there a technical reason we couldn't spin an -rc5 today
(with just the fix to this issue), and slip the schedule only by two
days instead of a full week?  And/or shorten the time for testing -rc5
from the usual 7 days into 5?

I don't know what other constraints we have to play with, so feel free
to tell me that my idea is not feasible.  Also, while I'm a block layer
contributor, I'm not one of its co-maintainers, so I'd trust the replies
from others a bit more than mine when deciding what to do here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13  5:45         ` Paolo Bonzini
@ 2017-04-13 14:39           ` Stefan Hajnoczi
  2017-04-13 14:45             ` Eric Blake
  2017-04-13 15:02             ` Jeff Cody
  2017-04-13 15:29           ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 2 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2017-04-13 14:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jeff Cody, Fam Zheng, kwolf, peter.maydell, qemu-block,
	qemu-devel, John Snow

[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]

On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote:
> 
> 
> On 13/04/2017 09:11, Jeff Cody wrote:
> >> It didn't make it into 2.9-rc4 because of limited time. :(
> >>
> >> Looks like there is no -rc5, we'll have to document this as a known issue.
> >> Users should "block-job-complete/cancel" as soon as possible to avoid such a
> >> hang.
> >
> > I'd argue for including a fix for 2.9, since this is both a regression, and
> > a hard lock without possible recovery short of restarting the QEMU process.
> 
> It is a bit of a corner case (and jobs on I/O thread are relatively rare
> too), so maybe it's not worth delaying 2.9.  It has been delayed already
> quite a bit.  Another reason I think I prefer to wait is to ensure that
> we have an entry in qemu-iotests to avoid the future regression.

I also think this does not require delaying the release:

1. It needs to be marked as a known issue in the release notes.
2. Let's roll the 2.9.1 stable release within a month of 2.9.0.

If both conditions are met then very few end users will be exposed to
the problem.  I hope libvirt will create IOThreads by default soon but
for the time being it is not a widely used configuration.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13 14:39           ` Stefan Hajnoczi
@ 2017-04-13 14:45             ` Eric Blake
  2017-04-13 14:50               ` Jeff Cody
  2017-04-13 15:02             ` Jeff Cody
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Blake @ 2017-04-13 14:45 UTC (permalink / raw)
  To: Stefan Hajnoczi, Paolo Bonzini
  Cc: kwolf, peter.maydell, Fam Zheng, qemu-block, Jeff Cody,
	qemu-devel, John Snow

[-- Attachment #1: Type: text/plain, Size: 1629 bytes --]

On 04/13/2017 09:39 AM, Stefan Hajnoczi wrote:
> On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote:
>>
>>
>> On 13/04/2017 09:11, Jeff Cody wrote:
>>>> It didn't make it into 2.9-rc4 because of limited time. :(
>>>>
>>>> Looks like there is no -rc5, we'll have to document this as a known issue.
>>>> Users should "block-job-complete/cancel" as soon as possible to avoid such a
>>>> hang.
>>>
>>> I'd argue for including a fix for 2.9, since this is both a regression, and
>>> a hard lock without possible recovery short of restarting the QEMU process.
>>
>> It is a bit of a corner case (and jobs on I/O thread are relatively rare
>> too), so maybe it's not worth delaying 2.9.  It has been delayed already
>> quite a bit.  Another reason I think I prefer to wait is to ensure that
>> we have an entry in qemu-iotests to avoid the future regression.
> 
> I also think this does not require delaying the release:
> 
> 1. It needs to be marked as a known issue in the release notes.
> 2. Let's roll the 2.9.1 stable release within a month of 2.9.0.
> 
> If both conditions are met then very few end users will be exposed to
> the problem.  I hope libvirt will create IOThreads by default soon but
> for the time being it is not a widely used configuration.

Also, is it something that can be avoided by not doing a system_reset
while a block job is still running? Libvirt can be taught to block reset
while a job has still not been finished, if needs be.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13 14:45             ` Eric Blake
@ 2017-04-13 14:50               ` Jeff Cody
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Cody @ 2017-04-13 14:50 UTC (permalink / raw)
  To: Eric Blake
  Cc: Stefan Hajnoczi, Paolo Bonzini, kwolf, peter.maydell, Fam Zheng,
	qemu-block, qemu-devel, John Snow

On Thu, Apr 13, 2017 at 09:45:49AM -0500, Eric Blake wrote:
> On 04/13/2017 09:39 AM, Stefan Hajnoczi wrote:
> > On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote:
> >>
> >>
> >> On 13/04/2017 09:11, Jeff Cody wrote:
> >>>> It didn't make it into 2.9-rc4 because of limited time. :(
> >>>>
> >>>> Looks like there is no -rc5, we'll have to document this as a known issue.
> >>>> Users should "block-job-complete/cancel" as soon as possible to avoid such a
> >>>> hang.
> >>>
> >>> I'd argue for including a fix for 2.9, since this is both a regression, and
> >>> a hard lock without possible recovery short of restarting the QEMU process.
> >>
> >> It is a bit of a corner case (and jobs on I/O thread are relatively rare
> >> too), so maybe it's not worth delaying 2.9.  It has been delayed already
> >> quite a bit.  Another reason I think I prefer to wait is to ensure that
> >> we have an entry in qemu-iotests to avoid the future regression.
> > 
> > I also think this does not require delaying the release:
> > 
> > 1. It needs to be marked as a known issue in the release notes.
> > 2. Let's roll the 2.9.1 stable release within a month of 2.9.0.
> > 
> > If both conditions are met then very few end users will be exposed to
> > the problem.  I hope libvirt will create IOThreads by default soon but
> > for the time being it is not a widely used configuration.
> 
> Also, is it something that can be avoided by not doing a system_reset
> while a block job is still running? Libvirt can be taught to block reset
> while a job has still not been finished, if needs be.
>

No - if the guest initiates a reboot itself, we still end up deadlocked.

-Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13 14:33         ` Eric Blake
@ 2017-04-13 14:53           ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2017-04-13 14:53 UTC (permalink / raw)
  To: Eric Blake
  Cc: Fam Zheng, Kevin Wolf, Qemu-block, Jeff Cody, QEMU Developers,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

On 13 April 2017 at 15:33, Eric Blake <eblake@redhat.com> wrote:
> On 04/13/2017 04:48 AM, Peter Maydell wrote:
>> Well, we *hope* there is no -rc5, but if the bug is genuinely
>> a "we can't release like this" bug we will obviously have to
>> do another rc. Basically you all as the block maintainers should
>> make the call about whether it's release-critical or not.
>
> Just curious: is there a technical reason we couldn't spin an -rc5 today
> (with just the fix to this issue), and slip the schedule only by two
> days instead of a full week?  And/or shorten the time for testing -rc5
> from the usual 7 days into 5?

I've just heard about another issue which probably will
require an rc5. Whether that means it makes sense to add
this fix too I don't know.

Not a technical reason, but we're a bit short on time to do an
rc5 today, and I'm away on holiday until Tuesday after that
anyway.

My current plan is that we'll roll rc5 on Tuesday with
either just the fix for that other issue or that plus this
one, and then tag final release Thursday.

What you lot as the block maintainers should decide is
whether this is a release-critical bug that justifies sticking
a fix in at the last minute, and if so make sure you have
a patch on the list which fixes it and has been reviewed
and flag it up to me so I can apply it on Tuesday.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13 14:39           ` Stefan Hajnoczi
  2017-04-13 14:45             ` Eric Blake
@ 2017-04-13 15:02             ` Jeff Cody
  2017-04-13 17:03               ` John Snow
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2017-04-13 15:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Paolo Bonzini, Fam Zheng, kwolf, peter.maydell, qemu-block,
	qemu-devel, John Snow

On Thu, Apr 13, 2017 at 03:39:59PM +0100, Stefan Hajnoczi wrote:
> On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote:
> > 
> > 
> > On 13/04/2017 09:11, Jeff Cody wrote:
> > >> It didn't make it into 2.9-rc4 because of limited time. :(
> > >>
> > >> Looks like there is no -rc5, we'll have to document this as a known issue.
> > >> Users should "block-job-complete/cancel" as soon as possible to avoid such a
> > >> hang.
> > >
> > > I'd argue for including a fix for 2.9, since this is both a regression, and
> > > a hard lock without possible recovery short of restarting the QEMU process.
> > 
> > It is a bit of a corner case (and jobs on I/O thread are relatively rare
> > too), so maybe it's not worth delaying 2.9.  It has been delayed already
> > quite a bit.  Another reason I think I prefer to wait is to ensure that
> > we have an entry in qemu-iotests to avoid the future regression.
> 
> I also think this does not require delaying the release:
> 
> 1. It needs to be marked as a known issue in the release notes.
> 2. Let's roll the 2.9.1 stable release within a month of 2.9.0.
> 
> If both conditions are met then very few end users will be exposed to
> the problem.  I hope libvirt will create IOThreads by default soon but
> for the time being it is not a widely used configuration.
> 

Without the fix, iothreads are not usable in 2.9.0, because a running block
job can create a deadlock by a guest-initiated reboot.  I think losing the
ability to use iothreads is enough reason to warrant a fix (especially if an
-rc5 may happen anyway).

-Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13  5:45         ` Paolo Bonzini
  2017-04-13 14:39           ` Stefan Hajnoczi
@ 2017-04-13 15:29           ` Stefan Hajnoczi
  1 sibling, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2017-04-13 15:29 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Jeff Cody, Kevin Wolf, Peter Maydell, qemu block, qemu-devel,
	Stefan Hajnoczi, Paolo Bonzini

On Thu, Apr 13, 2017 at 6:45 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 13/04/2017 09:11, Jeff Cody wrote:
>>> It didn't make it into 2.9-rc4 because of limited time. :(
>>>
>>> Looks like there is no -rc5, we'll have to document this as a known issue.
>>> Users should "block-job-complete/cancel" as soon as possible to avoid such a
>>> hang.
>>
>> I'd argue for including a fix for 2.9, since this is both a regression, and
>> a hard lock without possible recovery short of restarting the QEMU process.
>
> It is a bit of a corner case (and jobs on I/O thread are relatively rare
> too), so maybe it's not worth delaying 2.9.  It has been delayed already
> quite a bit.  Another reason I think I prefer to wait is to ensure that
> we have an entry in qemu-iotests to avoid the future regression.
>
> Fam explained to me what happens, and the root cause is that bdrv_drain
> never does a release/acquire pair in this case, so the I/O thread run
> remains stuck in a callback that tries to acquire.  Ironically
> reintroducing RFifoLock would probably fix this (not 100% sure).  Oops.
>
> His solution is a bit hacky, but we will hopefully be able to revert it
> in 2.10 or whenever aio_context_acquire/release will go away.

Fam, many of us will be offline Friday and Monday due to public
holidays.  Can you work on a patch that addresses Kevin's concerns
with "[PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin"?

I'll be officially offline too but am willing to review the patch.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
  2017-04-13 15:02             ` Jeff Cody
@ 2017-04-13 17:03               ` John Snow
  0 siblings, 0 replies; 16+ messages in thread
From: John Snow @ 2017-04-13 17:03 UTC (permalink / raw)
  To: Jeff Cody, Stefan Hajnoczi
  Cc: Paolo Bonzini, Fam Zheng, kwolf, peter.maydell, qemu-block,
	qemu-devel



On 04/13/2017 11:02 AM, Jeff Cody wrote:
> On Thu, Apr 13, 2017 at 03:39:59PM +0100, Stefan Hajnoczi wrote:
>> On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote:
>>>
>>>
>>> On 13/04/2017 09:11, Jeff Cody wrote:
>>>>> It didn't make it into 2.9-rc4 because of limited time. :(
>>>>>
>>>>> Looks like there is no -rc5, we'll have to document this as a known issue.
>>>>> Users should "block-job-complete/cancel" as soon as possible to avoid such a
>>>>> hang.
>>>>
>>>> I'd argue for including a fix for 2.9, since this is both a regression, and
>>>> a hard lock without possible recovery short of restarting the QEMU process.
>>>
>>> It is a bit of a corner case (and jobs on I/O thread are relatively rare
>>> too), so maybe it's not worth delaying 2.9.  It has been delayed already
>>> quite a bit.  Another reason I think I prefer to wait is to ensure that
>>> we have an entry in qemu-iotests to avoid the future regression.
>>
>> I also think this does not require delaying the release:
>>
>> 1. It needs to be marked as a known issue in the release notes.
>> 2. Let's roll the 2.9.1 stable release within a month of 2.9.0.
>>
>> If both conditions are met then very few end users will be exposed to
>> the problem.  I hope libvirt will create IOThreads by default soon but
>> for the time being it is not a widely used configuration.
>>
> 
> Without the fix, iothreads are not usable in 2.9.0, because a running block
> job can create a deadlock by a guest-initiated reboot.  I think losing the
> ability to use iothreads is enough reason to warrant a fix (especially if an
> -rc5 may happen anyway).
> 
> -Jeff
> 

Not that it's my area of expertise, but given that Fam's "hacky" patch
fixes two issues now and this is a deadlock that may indeed occur
through normal usage, I'd recommend it go into an rc5 if we're spinning
one anyway.

+1 to Jeff's reasoning.

--js

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-04-13 17:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-12 20:46 [Qemu-devel] Regression from 2.8: stuck in bdrv_drain() Jeff Cody
2017-04-12 21:38 ` John Snow
2017-04-12 22:22   ` Jeff Cody
2017-04-12 23:54     ` Fam Zheng
2017-04-13  1:11       ` Jeff Cody
2017-04-13  1:57         ` Jeff Cody
2017-04-13  5:45         ` Paolo Bonzini
2017-04-13 14:39           ` Stefan Hajnoczi
2017-04-13 14:45             ` Eric Blake
2017-04-13 14:50               ` Jeff Cody
2017-04-13 15:02             ` Jeff Cody
2017-04-13 17:03               ` John Snow
2017-04-13 15:29           ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-04-13  9:48       ` [Qemu-devel] " Peter Maydell
2017-04-13 14:33         ` Eric Blake
2017-04-13 14:53           ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).