[Qemu-devel] [PATCH 0/5] Fix segfault in blockjob race condition

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/5] Fix segfault in blockjob race condition
@ 2017-11-20  2:46 Jeff Cody
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion Jeff Cody
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Jeff Cody @ 2017-11-20  2:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, mreitz, stefanha, famz, pbonzini, kwolf

This series fixes a race condition segfault when using iothreads with
blockjobs.

The qemu iotest in this series is a reproducer, as is the reproducer
script attached in this bug report:

https://bugzilla.redhat.com/show_bug.cgi?id=1508708

There are two additional patches to try and catch this sort of scenario
with an abort, before a segfault or memory corruption occurs.

Jeff Cody (5):
  blockjob: do not allow coroutine double entry or
    entry-after-completion
  coroutine: abort if we try to enter coroutine scheduled for another
    ctx
  coroutines: abort if we try to enter a still-sleeping coroutine
  qemu-iotests: add option in common.qemu for mismatch only
  qemu-iotest: add test for blockjob coroutine race condition

 blockjob.c                     |  9 ++--
 include/qemu/coroutine_int.h   |  5 +++
 tests/qemu-iotests/200         | 99 ++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/200.out     | 14 ++++++
 tests/qemu-iotests/common.qemu |  8 +++-
 tests/qemu-iotests/group       |  1 +
 util/async.c                   |  7 +++
 util/qemu-coroutine-sleep.c    |  3 ++
 util/qemu-coroutine.c          | 14 ++++++
 9 files changed, 156 insertions(+), 4 deletions(-)
 create mode 100755 tests/qemu-iotests/200
 create mode 100644 tests/qemu-iotests/200.out

-- 
2.9.5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion
  2017-11-20  2:46 [Qemu-devel] [PATCH 0/5] Fix segfault in blockjob race condition Jeff Cody
@ 2017-11-20  2:46 ` Jeff Cody
  2017-11-20 11:16   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 2/5] coroutine: abort if we try to enter coroutine scheduled for another ctx Jeff Cody
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Jeff Cody @ 2017-11-20  2:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, mreitz, stefanha, famz, pbonzini, kwolf

When block_job_sleep_ns() is called, the co-routine is scheduled for
future execution.  If we allow the job to be re-entered prior to the
scheduled time, we present a race condition in which a coroutine can be
entered recursively, or even entered after the coroutine is deleted.

The job->busy flag is used by blockjobs when a coroutine is busy
executing. The function 'block_job_enter()' obeys the busy flag,
and will not enter a coroutine if set.  If we sleep a job, we need to
leave the busy flag set, so that subsequent calls to block_job_enter()
are prevented.

This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708

Also, in block_job_start(), set the relevant job flags (.busy, .paused)
before creating the coroutine, not just before executing it.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockjob.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 3a0c491..e181295 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
 {
     assert(job && !block_job_started(job) && job->paused &&
            job->driver && job->driver->start);
-    job->co = qemu_coroutine_create(block_job_co_entry, job);
     job->pause_count--;
     job->busy = true;
     job->paused = false;
+    job->co = qemu_coroutine_create(block_job_co_entry, job);
     bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 }
 
@@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
         return;
     }
 
-    job->busy = false;
+    /* We need to leave job->busy set here, because when we have
+     * put a coroutine to 'sleep', we have scheduled it to run in
+     * the future.  We cannot enter that same coroutine again before
+     * it wakes and runs, otherwise we risk double-entry or entry after
+     * completion. */
     if (!block_job_should_pause(job)) {
         co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
     }
-    job->busy = true;
 
     block_job_pause_point(job);
 }
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 2/5] coroutine: abort if we try to enter coroutine scheduled for another ctx
  2017-11-20  2:46 [Qemu-devel] [PATCH 0/5] Fix segfault in blockjob race condition Jeff Cody
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion Jeff Cody
@ 2017-11-20  2:46 ` Jeff Cody
  2017-11-20 11:28   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine Jeff Cody
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Jeff Cody @ 2017-11-20  2:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, mreitz, stefanha, famz, pbonzini, kwolf

The previous patch fixed a race condition, in which there were
coroutines being executing doubly, or after coroutine deletion.

We can detect common scenarios when this happens, and print an error
and abort before we corrupt memory / data, or segfault.

This patch will abort if an attempt to enter a coroutine is made while
it is currently pending execution in a different AioContext.

We cannot rely on the existing co->caller check for recursive re-entry
to catch this, as the coroutine may run and exit with
COROUTINE_TERMINATE before the AioContext scheduled event happens.

(This is the scenario that was occuring and fixed in the previous
patch).

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 include/qemu/coroutine_int.h | 3 +++
 util/async.c                 | 7 +++++++
 util/qemu-coroutine.c        | 9 +++++++++
 3 files changed, 19 insertions(+)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index cb98892..931cdc9 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -53,6 +53,9 @@ struct Coroutine {
 
     /* Only used when the coroutine has yielded.  */
     AioContext *ctx;
+
+    int scheduled;
+
     QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
     QSLIST_ENTRY(Coroutine) co_scheduled_next;
 };
diff --git a/util/async.c b/util/async.c
index 0e1bd87..d459684 100644
--- a/util/async.c
+++ b/util/async.c
@@ -388,6 +388,7 @@ static void co_schedule_bh_cb(void *opaque)
         QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
         trace_aio_co_schedule_bh_cb(ctx, co);
         aio_context_acquire(ctx);
+        co->scheduled = 0;
         qemu_coroutine_enter(co);
         aio_context_release(ctx);
     }
@@ -438,6 +439,12 @@ fail:
 void aio_co_schedule(AioContext *ctx, Coroutine *co)
 {
     trace_aio_co_schedule(ctx, co);
+    if (co->scheduled == 1) {
+        fprintf(stderr,
+                "Cannot schedule a co-routine that is already scheduled\n");
+        abort();
+    }
+    co->scheduled = 1;
     QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
                               co, co_scheduled_next);
     qemu_bh_schedule(ctx->co_schedule_bh);
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index d6095c1..2edab63 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -109,6 +109,15 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
 
     trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
 
+    /* if the Coroutine has already been scheduled, entering it again will
+     * cause us to enter it twice, potentially even after the coroutine has
+     * been deleted */
+    if (co->scheduled == 1) {
+        fprintf(stderr, "Cannot enter a co-routine that has already "
+                        "been scheduled to run in a different AioContext\n");
+        abort();
+    }
+
     if (co->caller) {
         fprintf(stderr, "Co-routine re-entered recursively\n");
         abort();
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20  2:46 [Qemu-devel] [PATCH 0/5] Fix segfault in blockjob race condition Jeff Cody
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion Jeff Cody
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 2/5] coroutine: abort if we try to enter coroutine scheduled for another ctx Jeff Cody
@ 2017-11-20  2:46 ` Jeff Cody
  2017-11-20 11:43   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2017-11-20 22:30   ` [Qemu-devel] " Paolo Bonzini
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 4/5] qemu-iotests: add option in common.qemu for mismatch only Jeff Cody
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 5/5] qemu-iotest: add test for blockjob coroutine race condition Jeff Cody
  4 siblings, 2 replies; 22+ messages in thread
From: Jeff Cody @ 2017-11-20  2:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, mreitz, stefanha, famz, pbonzini, kwolf

Once a coroutine is "sleeping", the timer callback will either enter the
coroutine, or schedule it for the next AioContext if using iothreads.

It is illegal to enter that coroutine while waiting for this timer
event and subsequent callback.  This patch will catch such an attempt,
and abort QEMU with an error.

Like with the previous patch, we cannot rely solely on the co->caller
check for recursive entry.  The prematurely entered coroutine may exit
with COROUTINE_TERMINATE before the timer expires, making co->caller no
longer valid.

We can clear co->sleeping in in co_sleep_cb(), because any doubly entry
attempt after point should be caught by either the co->scheduled or
co->caller checks.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 include/qemu/coroutine_int.h | 2 ++
 util/qemu-coroutine-sleep.c  | 3 +++
 util/qemu-coroutine.c        | 5 +++++
 3 files changed, 10 insertions(+)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 931cdc9..b071217 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -56,6 +56,8 @@ struct Coroutine {
 
     int scheduled;
 
+    int sleeping;
+
     QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
     QSLIST_ENTRY(Coroutine) co_scheduled_next;
 };
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 9c56550..11ae95a 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/coroutine.h"
+#include "qemu/coroutine_int.h"
 #include "qemu/timer.h"
 #include "block/aio.h"
 
@@ -25,6 +26,7 @@ static void co_sleep_cb(void *opaque)
 {
     CoSleepCB *sleep_cb = opaque;
 
+    sleep_cb->co->sleeping = 0;
     aio_co_wake(sleep_cb->co);
 }
 
@@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
     CoSleepCB sleep_cb = {
         .co = qemu_coroutine_self(),
     };
+    sleep_cb.co->sleeping = 1;
     sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
     timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
     qemu_coroutine_yield();
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 2edab63..1d9f93d 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -118,6 +118,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
         abort();
     }
 
+    if (co->sleeping == 1) {
+        fprintf(stderr, "Cannot enter a co-routine that is still sleeping\n");
+        abort();
+    }
+
     if (co->caller) {
         fprintf(stderr, "Co-routine re-entered recursively\n");
         abort();
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 4/5] qemu-iotests: add option in common.qemu for mismatch only
  2017-11-20  2:46 [Qemu-devel] [PATCH 0/5] Fix segfault in blockjob race condition Jeff Cody
                   ` (2 preceding siblings ...)
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine Jeff Cody
@ 2017-11-20  2:46 ` Jeff Cody
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 5/5] qemu-iotest: add test for blockjob coroutine race condition Jeff Cody
  4 siblings, 0 replies; 22+ messages in thread
From: Jeff Cody @ 2017-11-20  2:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, mreitz, stefanha, famz, pbonzini, kwolf

Add option to echo response to QMP / HMP command only on mismatch.

Useful for ignore all normal responses, but catching things like
segfaults.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 tests/qemu-iotests/common.qemu | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 7b3052d..85f66b8 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -50,6 +50,8 @@ _in_fd=4
 #
 # If $silent is set to anything but an empty string, then
 # response is not echoed out.
+# If $mismatch_only is set, only non-matching responses will
+# be echoed.
 function _timed_wait_for()
 {
     local h=${1}
@@ -58,14 +60,18 @@ function _timed_wait_for()
     QEMU_STATUS[$h]=0
     while IFS= read -t ${QEMU_COMM_TIMEOUT} resp <&${QEMU_OUT[$h]}
     do
-        if [ -z "${silent}" ]; then
+        if [ -z "${silent}" ] && [ -z "${mismatch_only}" ]; then
             echo "${resp}" | _filter_testdir | _filter_qemu \
                            | _filter_qemu_io | _filter_qmp | _filter_hmp
         fi
         grep -q "${*}" < <(echo "${resp}")
         if [ $? -eq 0 ]; then
             return
+        elif [ -z "${silent}" ] && [ -n "${mismatch_only}" ]; then
+            echo "${resp}" | _filter_testdir | _filter_qemu \
+                           | _filter_qemu_io | _filter_qmp | _filter_hmp
         fi
+
     done
     QEMU_STATUS[$h]=-1
     if [ -z "${qemu_error_no_exit}" ]; then
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 5/5] qemu-iotest: add test for blockjob coroutine race condition
  2017-11-20  2:46 [Qemu-devel] [PATCH 0/5] Fix segfault in blockjob race condition Jeff Cody
                   ` (3 preceding siblings ...)
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 4/5] qemu-iotests: add option in common.qemu for mismatch only Jeff Cody
@ 2017-11-20  2:46 ` Jeff Cody
  4 siblings, 0 replies; 22+ messages in thread
From: Jeff Cody @ 2017-11-20  2:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, mreitz, stefanha, famz, pbonzini, kwolf

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 tests/qemu-iotests/200     | 99 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/200.out | 14 +++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 114 insertions(+)
 create mode 100755 tests/qemu-iotests/200
 create mode 100644 tests/qemu-iotests/200.out

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
new file mode 100755
index 0000000..32fdec8
--- /dev/null
+++ b/tests/qemu-iotests/200
@@ -0,0 +1,99 @@
+#!/bin/bash
+#
+# Block job co-routine race condition test.
+#
+# See: https://bugzilla.redhat.com/show_bug.cgi?id=1508708
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=jcody@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1    # failure is the default!
+
+_cleanup()
+{
+    _cleanup_qemu
+    rm -f "${TEST_IMG}" "${BACKING_IMG}"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2 qed qcow
+_supported_proto file
+_supported_os Linux
+
+BACKING_IMG="${TEST_DIR}/backing.img"
+TEST_IMG="${TEST_DIR}/test.img"
+
+${QEMU_IMG} create -f $IMGFMT "${BACKING_IMG}" 512M | _filter_img_create
+${QEMU_IMG} create -f $IMGFMT -F $IMGFMT "${TEST_IMG}" -b "${BACKING_IMG}" 512M | _filter_img_create
+
+${QEMU_IO} -c "write -P 0xa5 512 300M" "${BACKING_IMG}" | _filter_qemu_io
+
+echo
+echo === Starting QEMU VM ===
+echo
+qemu_comm_method="qmp"
+_launch_qemu -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
+             -object iothread,id=iothread0 \
+             -device virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 \
+             -drive file="${TEST_IMG}",media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=$IMGFMT \
+             -device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0
+h1=$QEMU_HANDLE
+
+_send_qemu_cmd $h1 "{ 'execute': 'qmp_capabilities' }" 'return'
+
+echo
+echo === Sending stream/cancel, checking for SIGSEGV only  ===
+echo
+for (( i=1;i<500;i++ ))
+do
+    mismatch_only='y' qemu_error_no_exit='n' _send_qemu_cmd $h1 \
+                       "{
+                            'execute': 'block-stream',
+                            'arguments': {
+                                'device': 'drive_sysdisk',
+                                'speed': 10000000,
+                                'on-error': 'report',
+                                'job-id': 'job-$i'
+                             }
+                        }
+                        {
+                            'execute': 'block-job-cancel',
+                            'arguments': {
+                                'device': 'job-$i'
+                             }
+                        }" \
+                       "{.*{.*}.*}"  # should match all well-formed QMP responses
+done
+
+silent='y' _send_qemu_cmd $h1  "{ 'execute': 'quit' }" 'return'
+
+echo "$i iterations performed"
+
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/200.out b/tests/qemu-iotests/200.out
new file mode 100644
index 0000000..af6a809
--- /dev/null
+++ b/tests/qemu-iotests/200.out
@@ -0,0 +1,14 @@
+QA output created by 200
+Formatting 'TEST_DIR/backing.img', fmt=IMGFMT size=536870912
+Formatting 'TEST_DIR/test.img', fmt=IMGFMT size=536870912 backing_file=TEST_DIR/backing.img backing_fmt=IMGFMT
+wrote 314572800/314572800 bytes at offset 512
+300 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Starting QEMU VM ===
+
+{"return": {}}
+
+=== Sending stream/cancel, checking for SIGSEGV only ===
+
+500 iterations performed
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 24e5ad1..25d9adf 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -194,3 +194,4 @@
 194 rw auto migration quick
 195 rw auto quick
 197 rw auto quick
+200 rw auto
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion Jeff Cody
@ 2017-11-20 11:16   ` Stefan Hajnoczi
  2017-11-20 13:36     ` Jeff Cody
  2017-11-20 22:25     ` Paolo Bonzini
  0 siblings, 2 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2017-11-20 11:16 UTC (permalink / raw)
  To: Jeff Cody; +Cc: qemu-devel, kwolf, famz, qemu-block, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2368 bytes --]

On Sun, Nov 19, 2017 at 09:46:42PM -0500, Jeff Cody wrote:
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
>  {
>      assert(job && !block_job_started(job) && job->paused &&
>             job->driver && job->driver->start);
> -    job->co = qemu_coroutine_create(block_job_co_entry, job);
>      job->pause_count--;
>      job->busy = true;
>      job->paused = false;
> +    job->co = qemu_coroutine_create(block_job_co_entry, job);
>      bdrv_coroutine_enter(blk_bs(job->blk), job->co);
>  }
>  

This hunk makes no difference.  The coroutine is only entered by
bdrv_coroutine_enter() so the order of job field initialization doesn't
matter.

> @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
>          return;
>      }
>  
> -    job->busy = false;
> +    /* We need to leave job->busy set here, because when we have
> +     * put a coroutine to 'sleep', we have scheduled it to run in
> +     * the future.  We cannot enter that same coroutine again before
> +     * it wakes and runs, otherwise we risk double-entry or entry after
> +     * completion. */
>      if (!block_job_should_pause(job)) {
>          co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
>      }
> -    job->busy = true;
>  
>      block_job_pause_point(job);

This leaves a stale doc comment in include/block/blockjob_int.h:

  /**
   * block_job_sleep_ns:
   * @job: The job that calls the function.
   * @clock: The clock to sleep on.
   * @ns: How many nanoseconds to stop for.
   *
   * Put the job to sleep (assuming that it wasn't canceled) for @ns
   * nanoseconds.  Canceling the job will interrupt the wait immediately.
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   */
  void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);

This raises questions about the ability to cancel sleep:

1. Does something depend on cancelling sleep?

2. Did cancellation work properly in commit
   4513eafe928ff47486f4167c28d364c72b5ff7e3 ("block: add
   block_job_sleep_ns") and was it broken afterwards?

It is possible to fix the recursive coroutine entry without losing sleep
cancellation.  Whether it's worth the trouble depends on the answers to
the above questions.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 2/5] coroutine: abort if we try to enter coroutine scheduled for another ctx
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 2/5] coroutine: abort if we try to enter coroutine scheduled for another ctx Jeff Cody
@ 2017-11-20 11:28   ` Stefan Hajnoczi
  2017-11-20 13:42     ` Jeff Cody
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2017-11-20 11:28 UTC (permalink / raw)
  To: Jeff Cody; +Cc: qemu-devel, kwolf, famz, qemu-block, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1634 bytes --]

On Sun, Nov 19, 2017 at 09:46:43PM -0500, Jeff Cody wrote:
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index cb98892..931cdc9 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -53,6 +53,9 @@ struct Coroutine {
>  
>      /* Only used when the coroutine has yielded.  */
>      AioContext *ctx;
> +
> +    int scheduled;

s/int/bool/

> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index d6095c1..2edab63 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -109,6 +109,15 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
>  
>      trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
>  
> +    /* if the Coroutine has already been scheduled, entering it again will
> +     * cause us to enter it twice, potentially even after the coroutine has
> +     * been deleted */
> +    if (co->scheduled == 1) {
> +        fprintf(stderr, "Cannot enter a co-routine that has already "
> +                        "been scheduled to run in a different AioContext\n");

This error message is too specific, the AioContext doesn't have to be
different from the current one:

block/blkdebug.c:        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());

If something calls qemu_aio_coroutine_enter() on the coroutine it might
be from the same AioContext - but still an error condition worth failing
loudly on.

I suggest simplifying the error message:

  fprintf(stderr, "Cannot enter a co-routine that has already "
                  "been scheduled\n");

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine Jeff Cody
@ 2017-11-20 11:43   ` Stefan Hajnoczi
  2017-11-20 13:45     ` Jeff Cody
  2017-11-20 22:30   ` [Qemu-devel] " Paolo Bonzini
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2017-11-20 11:43 UTC (permalink / raw)
  To: Jeff Cody; +Cc: qemu-devel, kwolf, famz, qemu-block, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

On Sun, Nov 19, 2017 at 09:46:44PM -0500, Jeff Cody wrote:
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index 931cdc9..b071217 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -56,6 +56,8 @@ struct Coroutine {
>  
>      int scheduled;
>  
> +    int sleeping;

s/int/bool/

BTW an alternative to adding individual bools is to implement a finite
state machine for the entire coroutine lifecycle.  A single function can
validate all state transitions:

  void check_state_transition(CoState old, CoState new,
                              const char *action)
  {
      const char *errmsg = fsm[old][new];
      if (!errmsg) {
          return; /* valid transition! */
      }

      fprintf(stderr, "Cannot %s coroutine from %s state\n",
              action, state_name[old]);
      abort();
  }

Specifying fsm[][] forces us to think through all possible state
transitions.  This approach is proactive whereas adding bool flags is
reactive since it only covers a subset of states that were encountered
after crashes.  I'm not sure if it's worth it though :).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion
  2017-11-20 11:16   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2017-11-20 13:36     ` Jeff Cody
  2017-11-21 10:47       ` Stefan Hajnoczi
  2017-11-20 22:25     ` Paolo Bonzini
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff Cody @ 2017-11-20 13:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, kwolf, famz, qemu-block, mreitz, stefanha, pbonzini,
	jsnow

On Mon, Nov 20, 2017 at 11:16:53AM +0000, Stefan Hajnoczi wrote:
> On Sun, Nov 19, 2017 at 09:46:42PM -0500, Jeff Cody wrote:
> > --- a/blockjob.c
> > +++ b/blockjob.c
> > @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
> >  {
> >      assert(job && !block_job_started(job) && job->paused &&
> >             job->driver && job->driver->start);
> > -    job->co = qemu_coroutine_create(block_job_co_entry, job);
> >      job->pause_count--;
> >      job->busy = true;
> >      job->paused = false;
> > +    job->co = qemu_coroutine_create(block_job_co_entry, job);
> >      bdrv_coroutine_enter(blk_bs(job->blk), job->co);
> >  }
> >  
> 
> This hunk makes no difference.  The coroutine is only entered by
> bdrv_coroutine_enter() so the order of job field initialization doesn't
> matter.
> 

It likely makes no difference with the current code (unless there is a
latent bug). However I made the change to protect against the following
scenario - which, perhaps to your point, would be a bug in any case:

1. job->co = qemu_coroutine_create()

    * Now block_job_started() returns true, as it just checks for job->co

2. Another thread calls block_job_enter(), before we call
   bdrv_coroutine_enter().

    * block_job_enter() checks job->busy and block_job_started() to
      determine if coroutine entry is allowed.  Without this change, these
      checks could pass and coroutine entry could occur.

    * I don't think this can happen in the current code, but the above hunk
      change is still correct, and would protect against such an
      occurrence.

I guess the question is, "is it worth doing?", to try and prevent that sort
of buggy behavior. My thought was "yes" because:

    A) there is no penalty in doing it this way

    B) while a bug, double entry like this can lead to memory and/or
    data corruption, and the checks for co->caller et al. might not
    catch it.  This is particularly true if the coroutine exits
    (COROUTINE_TERMINATE) before the re-entry.

But maybe if we are concerned about that we should figure out a way to
abort() instead.  Of course, that makes allowing recursive coroutines more
difficult in the future.

> > @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
> >          return;
> >      }
> >  
> > -    job->busy = false;
> > +    /* We need to leave job->busy set here, because when we have
> > +     * put a coroutine to 'sleep', we have scheduled it to run in
> > +     * the future.  We cannot enter that same coroutine again before
> > +     * it wakes and runs, otherwise we risk double-entry or entry after
> > +     * completion. */
> >      if (!block_job_should_pause(job)) {
> >          co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
> >      }
> > -    job->busy = true;
> >  
> >      block_job_pause_point(job);
> 
> This leaves a stale doc comment in include/block/blockjob_int.h:
> 
>   /**
>    * block_job_sleep_ns:
>    * @job: The job that calls the function.
>    * @clock: The clock to sleep on.
>    * @ns: How many nanoseconds to stop for.
>    *
>    * Put the job to sleep (assuming that it wasn't canceled) for @ns
>    * nanoseconds.  Canceling the job will interrupt the wait immediately.
>                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    */

I didn't catch the doc, that should be changed as well.

>   void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
> 
> This raises questions about the ability to cancel sleep:
> 
> 1. Does something depend on cancelling sleep?
> 

Not that I can tell.  The advantage is that you don't have to wait for the
timer, so something like qmp_block_job_cancel() will cancel sooner.

But it is obviously broken with the current coroutine implementation to try
to do that.

> 2. Did cancellation work properly in commit
>    4513eafe928ff47486f4167c28d364c72b5ff7e3 ("block: add
>    block_job_sleep_ns") and was it broken afterwards?
> 

With iothreads, the answer is complicated.  It was broken for a while for
other reasons.

It broke after using aio_co_wake() in the sleep timer cb (commmit
2f47da5f7f), which added the ability to schedule a coroutine if the timer
callback was called from the wrong AioContext.

Prior to that it "worked" in that the segfault was not present.

But even to bisect back to 2f47da5f7f was not straightforward, because
attempting them stream/cancel with iothreads would not even work until
c324fd0 (so I only bisected back as far as c324fd0 would cleanly apply).

And it is tricky to say if it "works" or not, because it is racy.  What may
have appeared to work may be more attributed to luck and timing.

If the coroutine is going to run at a future time, we cannot enter it
beforehand.  We risk the coroutine not even existing when the timer does run
the sleeping coroutine.  At the very least, early entry with the current
code would require a way to delete the associated timer.

> It is possible to fix the recursive coroutine entry without losing sleep
> cancellation.  Whether it's worth the trouble depends on the answers to
> the above questions.
> 

I contemplated the same thing.

At least for 2.11, fixing recursive coroutine entry is probably more than we
want to do.

Long term, my opinion is that we should fix it, because preventing it
becomes more difficult. It is easy to miss something that might cause a
recursive entry in code reviews, and since it can be racy, casual testing
may often miss it as well.

Jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 2/5] coroutine: abort if we try to enter coroutine scheduled for another ctx
  2017-11-20 11:28   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2017-11-20 13:42     ` Jeff Cody
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff Cody @ 2017-11-20 13:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, kwolf, famz, qemu-block, mreitz, stefanha, pbonzini

On Mon, Nov 20, 2017 at 11:28:26AM +0000, Stefan Hajnoczi wrote:
> On Sun, Nov 19, 2017 at 09:46:43PM -0500, Jeff Cody wrote:
> > diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> > index cb98892..931cdc9 100644
> > --- a/include/qemu/coroutine_int.h
> > +++ b/include/qemu/coroutine_int.h
> > @@ -53,6 +53,9 @@ struct Coroutine {
> >  
> >      /* Only used when the coroutine has yielded.  */
> >      AioContext *ctx;
> > +
> > +    int scheduled;
> 
> s/int/bool/
> 

OK.

> > diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> > index d6095c1..2edab63 100644
> > --- a/util/qemu-coroutine.c
> > +++ b/util/qemu-coroutine.c
> > @@ -109,6 +109,15 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
> >  
> >      trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
> >  
> > +    /* if the Coroutine has already been scheduled, entering it again will
> > +     * cause us to enter it twice, potentially even after the coroutine has
> > +     * been deleted */
> > +    if (co->scheduled == 1) {
> > +        fprintf(stderr, "Cannot enter a co-routine that has already "
> > +                        "been scheduled to run in a different AioContext\n");
> 
> This error message is too specific, the AioContext doesn't have to be
> different from the current one:
> 
> block/blkdebug.c:        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
> 
> If something calls qemu_aio_coroutine_enter() on the coroutine it might
> be from the same AioContext - but still an error condition worth failing
> loudly on.
> 

Good point.


> I suggest simplifying the error message:
> 
>   fprintf(stderr, "Cannot enter a co-routine that has already "
>                   "been scheduled\n");

OK. I'll also change the wording here to what you have above, as well:

  void aio_co_schedule(AioContext *ctx, Coroutine *co)
  {
       trace_aio_co_schedule(ctx, co);
  +    if (co->scheduled == 1) {
  +        fprintf(stderr,
  +                "Cannot schedule a co-routine that is already scheduled\n");
  +        abort();
  +    }
  +    co->scheduled = 1;
       QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
                                 co, co_scheduled_next);


Jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20 11:43   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2017-11-20 13:45     ` Jeff Cody
  2017-11-21 10:17       ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Cody @ 2017-11-20 13:45 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, kwolf, famz, qemu-block, mreitz, stefanha, pbonzini

On Mon, Nov 20, 2017 at 11:43:34AM +0000, Stefan Hajnoczi wrote:
> On Sun, Nov 19, 2017 at 09:46:44PM -0500, Jeff Cody wrote:
> > diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> > index 931cdc9..b071217 100644
> > --- a/include/qemu/coroutine_int.h
> > +++ b/include/qemu/coroutine_int.h
> > @@ -56,6 +56,8 @@ struct Coroutine {
> >  
> >      int scheduled;
> >  
> > +    int sleeping;
> 
> s/int/bool/
> 

OK.

> BTW an alternative to adding individual bools is to implement a finite
> state machine for the entire coroutine lifecycle.  A single function can
> validate all state transitions:
> 
>   void check_state_transition(CoState old, CoState new,
>                               const char *action)
>   {
>       const char *errmsg = fsm[old][new];
>       if (!errmsg) {
>           return; /* valid transition! */
>       }
> 
>       fprintf(stderr, "Cannot %s coroutine from %s state\n",
>               action, state_name[old]);
>       abort();
>   }
> 
> Specifying fsm[][] forces us to think through all possible state
> transitions.  This approach is proactive whereas adding bool flags is
> reactive since it only covers a subset of states that were encountered
> after crashes.  I'm not sure if it's worth it though :).

Interesting idea; maybe more for 2.12 instead of 2.11, though?

Jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion
  2017-11-20 11:16   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2017-11-20 13:36     ` Jeff Cody
@ 2017-11-20 22:25     ` Paolo Bonzini
  2017-11-21 12:42       ` Kevin Wolf
  1 sibling, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2017-11-20 22:25 UTC (permalink / raw)
  To: Stefan Hajnoczi, Jeff Cody
  Cc: qemu-devel, kwolf, famz, qemu-block, mreitz, stefanha

[-- Attachment #1: Type: text/plain, Size: 437 bytes --]

On 20/11/2017 12:16, Stefan Hajnoczi wrote:
> This raises questions about the ability to cancel sleep:
> 
> 1. Does something depend on cancelling sleep?

block_job_cancel does, but in practice the sleep time is so small
(smaller than SLICE_TIME, which is 100 ms) that we probably don't care.

I agree with Jeff that canceling the sleep by force-entering the
coroutine seemed clever but is probably a very bad idea.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20  2:46 ` [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine Jeff Cody
  2017-11-20 11:43   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2017-11-20 22:30   ` Paolo Bonzini
  2017-11-20 22:35     ` Jeff Cody
  1 sibling, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2017-11-20 22:30 UTC (permalink / raw)
  To: Jeff Cody, qemu-devel; +Cc: qemu-block, mreitz, stefanha, famz, kwolf

On 20/11/2017 03:46, Jeff Cody wrote:
> Once a coroutine is "sleeping", the timer callback will either enter the
> coroutine, or schedule it for the next AioContext if using iothreads.
> 
> It is illegal to enter that coroutine while waiting for this timer
> event and subsequent callback.  This patch will catch such an attempt,
> and abort QEMU with an error.
> 
> Like with the previous patch, we cannot rely solely on the co->caller
> check for recursive entry.  The prematurely entered coroutine may exit
> with COROUTINE_TERMINATE before the timer expires, making co->caller no
> longer valid.
> 
> We can clear co->sleeping in in co_sleep_cb(), because any doubly entry
> attempt after point should be caught by either the co->scheduled or
> co->caller checks.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  include/qemu/coroutine_int.h | 2 ++
>  util/qemu-coroutine-sleep.c  | 3 +++
>  util/qemu-coroutine.c        | 5 +++++
>  3 files changed, 10 insertions(+)
> 
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index 931cdc9..b071217 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -56,6 +56,8 @@ struct Coroutine {
>  
>      int scheduled;
>  
> +    int sleeping;

Is this a different "state" (in Stefan's parlance) than scheduled?  In
practice both means that someone may call qemu_(aio_)coroutine_enter
concurrently, so you'd better not do it yourself.

Paolo

> +
>      QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
>      QSLIST_ENTRY(Coroutine) co_scheduled_next;
>  };
> diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
> index 9c56550..11ae95a 100644
> --- a/util/qemu-coroutine-sleep.c
> +++ b/util/qemu-coroutine-sleep.c
> @@ -13,6 +13,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/coroutine.h"
> +#include "qemu/coroutine_int.h"
>  #include "qemu/timer.h"
>  #include "block/aio.h"
>  
> @@ -25,6 +26,7 @@ static void co_sleep_cb(void *opaque)
>  {
>      CoSleepCB *sleep_cb = opaque;
>  
> +    sleep_cb->co->sleeping = 0;
>      aio_co_wake(sleep_cb->co);
>  }
>  
> @@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
>      CoSleepCB sleep_cb = {
>          .co = qemu_coroutine_self(),
>      };
> +    sleep_cb.co->sleeping = 1;
>      sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
>      timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
>      qemu_coroutine_yield();
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index 2edab63..1d9f93d 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -118,6 +118,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
>          abort();
>      }
>  
> +    if (co->sleeping == 1) {
> +        fprintf(stderr, "Cannot enter a co-routine that is still sleeping\n");
> +        abort();
> +    }
> +
>      if (co->caller) {
>          fprintf(stderr, "Co-routine re-entered recursively\n");
>          abort();
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20 22:30   ` [Qemu-devel] " Paolo Bonzini
@ 2017-11-20 22:35     ` Jeff Cody
  2017-11-20 22:47       ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Cody @ 2017-11-20 22:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, mreitz, stefanha, famz, kwolf

On Mon, Nov 20, 2017 at 11:30:39PM +0100, Paolo Bonzini wrote:
> On 20/11/2017 03:46, Jeff Cody wrote:
> > Once a coroutine is "sleeping", the timer callback will either enter the
> > coroutine, or schedule it for the next AioContext if using iothreads.
> > 
> > It is illegal to enter that coroutine while waiting for this timer
> > event and subsequent callback.  This patch will catch such an attempt,
> > and abort QEMU with an error.
> > 
> > Like with the previous patch, we cannot rely solely on the co->caller
> > check for recursive entry.  The prematurely entered coroutine may exit
> > with COROUTINE_TERMINATE before the timer expires, making co->caller no
> > longer valid.
> > 
> > We can clear co->sleeping in in co_sleep_cb(), because any doubly entry
> > attempt after point should be caught by either the co->scheduled or
> > co->caller checks.
> > 
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> >  include/qemu/coroutine_int.h | 2 ++
> >  util/qemu-coroutine-sleep.c  | 3 +++
> >  util/qemu-coroutine.c        | 5 +++++
> >  3 files changed, 10 insertions(+)
> > 
> > diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> > index 931cdc9..b071217 100644
> > --- a/include/qemu/coroutine_int.h
> > +++ b/include/qemu/coroutine_int.h
> > @@ -56,6 +56,8 @@ struct Coroutine {
> >  
> >      int scheduled;
> >  
> > +    int sleeping;
> 
> Is this a different "state" (in Stefan's parlance) than scheduled?  In
> practice both means that someone may call qemu_(aio_)coroutine_enter
> concurrently, so you'd better not do it yourself.
> 

It is slightly different; it is from sleeping with a timer via
co_aio_sleep_ns and waking via co_sleep_cb.  Whereas the 'co->scheduled' is
specifically from being scheduled for a specific AioContext, via
aio_co_schedule().

In practice, 'co->schedule' and 'co->sleeping' certainly rhyme, at the very
least.

But having them separate will put the abort closer to where the problem lies,
so it should make debugging a bit easier if we hit it.

> 
> > +
> >      QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
> >      QSLIST_ENTRY(Coroutine) co_scheduled_next;
> >  };
> > diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
> > index 9c56550..11ae95a 100644
> > --- a/util/qemu-coroutine-sleep.c
> > +++ b/util/qemu-coroutine-sleep.c
> > @@ -13,6 +13,7 @@
> >  
> >  #include "qemu/osdep.h"
> >  #include "qemu/coroutine.h"
> > +#include "qemu/coroutine_int.h"
> >  #include "qemu/timer.h"
> >  #include "block/aio.h"
> >  
> > @@ -25,6 +26,7 @@ static void co_sleep_cb(void *opaque)
> >  {
> >      CoSleepCB *sleep_cb = opaque;
> >  
> > +    sleep_cb->co->sleeping = 0;
> >      aio_co_wake(sleep_cb->co);
> >  }
> >  
> > @@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
> >      CoSleepCB sleep_cb = {
> >          .co = qemu_coroutine_self(),
> >      };
> > +    sleep_cb.co->sleeping = 1;
> >      sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
> >      timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
> >      qemu_coroutine_yield();
> > diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> > index 2edab63..1d9f93d 100644
> > --- a/util/qemu-coroutine.c
> > +++ b/util/qemu-coroutine.c
> > @@ -118,6 +118,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
> >          abort();
> >      }
> >  
> > +    if (co->sleeping == 1) {
> > +        fprintf(stderr, "Cannot enter a co-routine that is still sleeping\n");
> > +        abort();
> > +    }
> > +
> >      if (co->caller) {
> >          fprintf(stderr, "Co-routine re-entered recursively\n");
> >          abort();
> > 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20 22:35     ` Jeff Cody
@ 2017-11-20 22:47       ` Paolo Bonzini
  2017-11-20 23:08         ` Jeff Cody
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2017-11-20 22:47 UTC (permalink / raw)
  To: Jeff Cody; +Cc: qemu-devel, qemu-block, mreitz, stefanha, famz, kwolf

On 20/11/2017 23:35, Jeff Cody wrote:
>> Is this a different "state" (in Stefan's parlance) than scheduled?  In
>> practice both means that someone may call qemu_(aio_)coroutine_enter
>> concurrently, so you'd better not do it yourself.
>>
> It is slightly different; it is from sleeping with a timer via
> co_aio_sleep_ns and waking via co_sleep_cb.  Whereas the 'co->scheduled' is
> specifically from being scheduled for a specific AioContext, via
> aio_co_schedule().

Right; however, that would only make a difference if we allowed
canceling a co_aio_sleep_ns.  Since we don't want that, they have the
same transitions.

> In practice, 'co->schedule' and 'co->sleeping' certainly rhyme, at the very
> least.
> 
> But having them separate will put the abort closer to where the problem lies,
> so it should make debugging a bit easier if we hit it.

What do you mean by closer?  It would print a slightly more informative
message, but the message is in qemu_aio_coroutine_for both cases.

In fact, unifying co->scheduled and co->sleeping means that you can
easily abort when co_aio_sleep_ns is called on a scheduled coroutine, like

    /* This is valid. */
    aio_co_schedule(qemu_get_current_aio_context(),
                    qemu_coroutine_self());

    /* But only if there was a qemu_coroutine_yield here.  */
    co_aio_sleep_ns(qemu_get_current_aio_context(), 1000);

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20 22:47       ` Paolo Bonzini
@ 2017-11-20 23:08         ` Jeff Cody
  2017-11-20 23:13           ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Cody @ 2017-11-20 23:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, mreitz, stefanha, famz, kwolf

On Mon, Nov 20, 2017 at 11:47:09PM +0100, Paolo Bonzini wrote:
> On 20/11/2017 23:35, Jeff Cody wrote:
> >> Is this a different "state" (in Stefan's parlance) than scheduled?  In
> >> practice both means that someone may call qemu_(aio_)coroutine_enter
> >> concurrently, so you'd better not do it yourself.
> >>
> > It is slightly different; it is from sleeping with a timer via
> > co_aio_sleep_ns and waking via co_sleep_cb.  Whereas the 'co->scheduled' is
> > specifically from being scheduled for a specific AioContext, via
> > aio_co_schedule().
> 
> Right; however, that would only make a difference if we allowed
> canceling a co_aio_sleep_ns.  Since we don't want that, they have the
> same transitions.
> 
> > In practice, 'co->schedule' and 'co->sleeping' certainly rhyme, at the very
> > least.
> > 
> > But having them separate will put the abort closer to where the problem lies,
> > so it should make debugging a bit easier if we hit it.
> 
> What do you mean by closer?  It would print a slightly more informative
> message, but the message is in qemu_aio_coroutine_for both cases.
> 

Sorry, sloppy wording; I meant what you said above, that the error message
is more informative, so by tracking down where co->sleeping is set the
developer is closer to where the problem lies.

> In fact, unifying co->scheduled and co->sleeping means that you can
> easily abort when co_aio_sleep_ns is called on a scheduled coroutine, like
> 
>     /* This is valid. */
>     aio_co_schedule(qemu_get_current_aio_context(),
>                     qemu_coroutine_self());
> 
>     /* But only if there was a qemu_coroutine_yield here.  */
>     co_aio_sleep_ns(qemu_get_current_aio_context(), 1000);
>

That is true.  But we could also check (co->sleeping || co->scheduled) in
co_aio_sleep_ns() though, as well.

Hmm... not checking co->sleeping in co_aio_sleep_ns() is a bug in my
patch.  We don't want to schedule a coroutine on two different timers,
either.

So what do you think about adding this to the patch:

@@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
     CoSleepCB sleep_cb = {
         .co = qemu_coroutine_self(),
     };
+    if (sleep_cb.co->sleeping == 1 || sleep_cb.co->scheduled == 1) {
+       fprintf(stderr, "Cannot sleep a co-routine that is already sleeping "
+                       " or scheduled\n");
+       abort();
+    }
+    sleep_cb.co->sleeping = 1;
     sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
     timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
     qemu_coroutine_yield();


Jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20 23:08         ` Jeff Cody
@ 2017-11-20 23:13           ` Paolo Bonzini
  2017-11-20 23:31             ` Jeff Cody
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2017-11-20 23:13 UTC (permalink / raw)
  To: Jeff Cody; +Cc: qemu-devel, qemu-block, mreitz, stefanha, famz, kwolf

On 21/11/2017 00:08, Jeff Cody wrote:
> @@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
>      CoSleepCB sleep_cb = {
>          .co = qemu_coroutine_self(),
>      };
> +    if (sleep_cb.co->sleeping == 1 || sleep_cb.co->scheduled == 1) {
> +       fprintf(stderr, "Cannot sleep a co-routine that is already sleeping "
> +                       " or scheduled\n");
> +       abort();
> +    }
> +    sleep_cb.co->sleeping = 1;
>      sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
>      timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
>      qemu_coroutine_yield();

I understand that this was just an example and not the actual patch, but
I'll still point out that this loses the benefit (better error message)
of keeping the flags separate.

What do you think about making "scheduled" a const char * and assigning
__func__ to it (i.e. either "aio_co_schedule" or "co_aio_sleep_ns")?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20 23:13           ` Paolo Bonzini
@ 2017-11-20 23:31             ` Jeff Cody
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff Cody @ 2017-11-20 23:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, mreitz, stefanha, famz, kwolf

On Tue, Nov 21, 2017 at 12:13:46AM +0100, Paolo Bonzini wrote:
> On 21/11/2017 00:08, Jeff Cody wrote:
> > @@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
> >      CoSleepCB sleep_cb = {
> >          .co = qemu_coroutine_self(),
> >      };
> > +    if (sleep_cb.co->sleeping == 1 || sleep_cb.co->scheduled == 1) {
> > +       fprintf(stderr, "Cannot sleep a co-routine that is already sleeping "
> > +                       " or scheduled\n");
> > +       abort();
> > +    }
> > +    sleep_cb.co->sleeping = 1;
> >      sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
> >      timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
> >      qemu_coroutine_yield();
> 
> I understand that this was just an example and not the actual patch, but
> I'll still point out that this loses the benefit (better error message)
> of keeping the flags separate.
> 
> What do you think about making "scheduled" a const char * and assigning
> __func__ to it (i.e. either "aio_co_schedule" or "co_aio_sleep_ns")?
> 

Ohhh, nice.  I'll spin a v2 with that, and merge patches 3 and 5 together.
And then maybe for 2.12 we can look at making it a fsm, like Stefan
suggested (or somehow make coroutine entry thread safe and idempotent).

Jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine
  2017-11-20 13:45     ` Jeff Cody
@ 2017-11-21 10:17       ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2017-11-21 10:17 UTC (permalink / raw)
  To: Jeff Cody
  Cc: Stefan Hajnoczi, qemu-devel, kwolf, famz, qemu-block, mreitz,
	pbonzini

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

On Mon, Nov 20, 2017 at 08:45:21AM -0500, Jeff Cody wrote:
> On Mon, Nov 20, 2017 at 11:43:34AM +0000, Stefan Hajnoczi wrote:
> > On Sun, Nov 19, 2017 at 09:46:44PM -0500, Jeff Cody wrote:
> > BTW an alternative to adding individual bools is to implement a finite
> > state machine for the entire coroutine lifecycle.  A single function can
> > validate all state transitions:
> > 
> >   void check_state_transition(CoState old, CoState new,
> >                               const char *action)
> >   {
> >       const char *errmsg = fsm[old][new];
> >       if (!errmsg) {
> >           return; /* valid transition! */
> >       }
> > 
> >       fprintf(stderr, "Cannot %s coroutine from %s state\n",
> >               action, state_name[old]);
> >       abort();
> >   }
> > 
> > Specifying fsm[][] forces us to think through all possible state
> > transitions.  This approach is proactive whereas adding bool flags is
> > reactive since it only covers a subset of states that were encountered
> > after crashes.  I'm not sure if it's worth it though :).
> 
> Interesting idea; maybe more for 2.12 instead of 2.11, though?

Sure.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion
  2017-11-20 13:36     ` Jeff Cody
@ 2017-11-21 10:47       ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2017-11-21 10:47 UTC (permalink / raw)
  To: Jeff Cody
  Cc: Stefan Hajnoczi, qemu-devel, kwolf, famz, qemu-block, mreitz,
	pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 6796 bytes --]

On Mon, Nov 20, 2017 at 08:36:19AM -0500, Jeff Cody wrote:
> On Mon, Nov 20, 2017 at 11:16:53AM +0000, Stefan Hajnoczi wrote:
> > On Sun, Nov 19, 2017 at 09:46:42PM -0500, Jeff Cody wrote:
> > > --- a/blockjob.c
> > > +++ b/blockjob.c
> > > @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
> > >  {
> > >      assert(job && !block_job_started(job) && job->paused &&
> > >             job->driver && job->driver->start);
> > > -    job->co = qemu_coroutine_create(block_job_co_entry, job);
> > >      job->pause_count--;
> > >      job->busy = true;
> > >      job->paused = false;
> > > +    job->co = qemu_coroutine_create(block_job_co_entry, job);
> > >      bdrv_coroutine_enter(blk_bs(job->blk), job->co);
> > >  }
> > >  
> > 
> > This hunk makes no difference.  The coroutine is only entered by
> > bdrv_coroutine_enter() so the order of job field initialization doesn't
> > matter.
> > 
> 
> It likely makes no difference with the current code (unless there is a
> latent bug). However I made the change to protect against the following
> scenario - which, perhaps to your point, would be a bug in any case:
> 
> 1. job->co = qemu_coroutine_create()
> 
>     * Now block_job_started() returns true, as it just checks for job->co
> 
> 2. Another thread calls block_job_enter(), before we call
>    bdrv_coroutine_enter().

The job is protected by AioContext acquire/release.  Other threads
cannot touch it because the block_job_start() caller has already
acquired the AioContext.

> 
>     * block_job_enter() checks job->busy and block_job_started() to
>       determine if coroutine entry is allowed.  Without this change, these
>       checks could pass and coroutine entry could occur.
> 
>     * I don't think this can happen in the current code, but the above hunk
>       change is still correct, and would protect against such an
>       occurrence.
>
> I guess the question is, "is it worth doing?", to try and prevent that sort
> of buggy behavior. My thought was "yes" because:
> 
>     A) there is no penalty in doing it this way
> 
>     B) while a bug, double entry like this can lead to memory and/or
>     data corruption, and the checks for co->caller et al. might not
>     catch it.  This is particularly true if the coroutine exits
>     (COROUTINE_TERMINATE) before the re-entry.
> 
> But maybe if we are concerned about that we should figure out a way to
> abort() instead.  Of course, that makes allowing recursive coroutines more
> difficult in the future.

The compiler and CPU can reorder memory accesses so simply reordering
assignment statements is ineffective against threads.

I'm against merging this hunk because:

1. There is a proper thread-safety mechanism in place that callers are
   already using, so this is the wrong way to attempt to provide
   thread-safety.

2. This change doesn't protect against the multi-threaded scenario you
   described because the memory order isn't being controlled.

> 
> 
> > > @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
> > >          return;
> > >      }
> > >  
> > > -    job->busy = false;
> > > +    /* We need to leave job->busy set here, because when we have
> > > +     * put a coroutine to 'sleep', we have scheduled it to run in
> > > +     * the future.  We cannot enter that same coroutine again before
> > > +     * it wakes and runs, otherwise we risk double-entry or entry after
> > > +     * completion. */
> > >      if (!block_job_should_pause(job)) {
> > >          co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
> > >      }
> > > -    job->busy = true;
> > >  
> > >      block_job_pause_point(job);
> > 
> > This leaves a stale doc comment in include/block/blockjob_int.h:
> > 
> >   /**
> >    * block_job_sleep_ns:
> >    * @job: The job that calls the function.
> >    * @clock: The clock to sleep on.
> >    * @ns: How many nanoseconds to stop for.
> >    *
> >    * Put the job to sleep (assuming that it wasn't canceled) for @ns
> >    * nanoseconds.  Canceling the job will interrupt the wait immediately.
> >                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >    */
> 
> I didn't catch the doc, that should be changed as well.
> 
> >   void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
> > 
> > This raises questions about the ability to cancel sleep:
> > 
> > 1. Does something depend on cancelling sleep?
> > 
> 
> Not that I can tell.  The advantage is that you don't have to wait for the
> timer, so something like qmp_block_job_cancel() will cancel sooner.
> 
> But it is obviously broken with the current coroutine implementation to try
> to do that.
> 
> > 2. Did cancellation work properly in commit
> >    4513eafe928ff47486f4167c28d364c72b5ff7e3 ("block: add
> >    block_job_sleep_ns") and was it broken afterwards?
> > 
> 
> With iothreads, the answer is complicated.  It was broken for a while for
> other reasons.
> 
> It broke after using aio_co_wake() in the sleep timer cb (commmit
> 2f47da5f7f), which added the ability to schedule a coroutine if the timer
> callback was called from the wrong AioContext.
> 
> Prior to that it "worked" in that the segfault was not present.
> 
> But even to bisect back to 2f47da5f7f was not straightforward, because
> attempting them stream/cancel with iothreads would not even work until
> c324fd0 (so I only bisected back as far as c324fd0 would cleanly apply).
> 
> And it is tricky to say if it "works" or not, because it is racy.  What may
> have appeared to work may be more attributed to luck and timing.
> 
> If the coroutine is going to run at a future time, we cannot enter it
> beforehand.  We risk the coroutine not even existing when the timer does run
> the sleeping coroutine.  At the very least, early entry with the current
> code would require a way to delete the associated timer.
> 
> > It is possible to fix the recursive coroutine entry without losing sleep
> > cancellation.  Whether it's worth the trouble depends on the answers to
> > the above questions.
> > 
> 
> I contemplated the same thing.
> 
> At least for 2.11, fixing recursive coroutine entry is probably more than we
> want to do.
> 
> Long term, my opinion is that we should fix it, because preventing it
> becomes more difficult. It is easy to miss something that might cause a
> recursive entry in code reviews, and since it can be racy, casual testing
> may often miss it as well.

I think both your and Paolos answers show that we don't need to cancel
the timer.  It's okay if the coroutine sleeps for the full duration.
I'm happy with your approach.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion
  2017-11-20 22:25     ` Paolo Bonzini
@ 2017-11-21 12:42       ` Kevin Wolf
  0 siblings, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2017-11-21 12:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefan Hajnoczi, Jeff Cody, qemu-devel, famz, qemu-block, mreitz,
	stefanha

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

Am 20.11.2017 um 23:25 hat Paolo Bonzini geschrieben:
> On 20/11/2017 12:16, Stefan Hajnoczi wrote:
> > This raises questions about the ability to cancel sleep:
> > 
> > 1. Does something depend on cancelling sleep?
> 
> block_job_cancel does, but in practice the sleep time is so small
> (smaller than SLICE_TIME, which is 100 ms) that we probably don't care.

Just note that this is something that can happen during the final
migration phase when the VM is already stopped. In other words, with
non-shared storage, these up to 100 ms are added to the migration
downtime.

Kevin

> I agree with Jeff that canceling the sleep by force-entering the
> coroutine seemed clever but is probably a very bad idea.
> 
> Paolo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-11-21 12:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-20  2:46 [Qemu-devel] [PATCH 0/5] Fix segfault in blockjob race condition Jeff Cody
2017-11-20  2:46 ` [Qemu-devel] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion Jeff Cody
2017-11-20 11:16   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-11-20 13:36     ` Jeff Cody
2017-11-21 10:47       ` Stefan Hajnoczi
2017-11-20 22:25     ` Paolo Bonzini
2017-11-21 12:42       ` Kevin Wolf
2017-11-20  2:46 ` [Qemu-devel] [PATCH 2/5] coroutine: abort if we try to enter coroutine scheduled for another ctx Jeff Cody
2017-11-20 11:28   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-11-20 13:42     ` Jeff Cody
2017-11-20  2:46 ` [Qemu-devel] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine Jeff Cody
2017-11-20 11:43   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-11-20 13:45     ` Jeff Cody
2017-11-21 10:17       ` Stefan Hajnoczi
2017-11-20 22:30   ` [Qemu-devel] " Paolo Bonzini
2017-11-20 22:35     ` Jeff Cody
2017-11-20 22:47       ` Paolo Bonzini
2017-11-20 23:08         ` Jeff Cody
2017-11-20 23:13           ` Paolo Bonzini
2017-11-20 23:31             ` Jeff Cody
2017-11-20  2:46 ` [Qemu-devel] [PATCH 4/5] qemu-iotests: add option in common.qemu for mismatch only Jeff Cody
2017-11-20  2:46 ` [Qemu-devel] [PATCH 5/5] qemu-iotest: add test for blockjob coroutine race condition Jeff Cody

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).