* [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()
@ 2023-09-05 14:50 Kevin Wolf
2023-09-05 14:50 ` [PATCH 1/2] vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn Kevin Wolf
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Kevin Wolf @ 2023-09-05 14:50 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, mst, quintela, peterx, leobras, stefanha, pbonzini,
qemu-block, qemu-stable
This fixes a recently introduced assertion failure that was reported to
happen when migrating virtio-net with a failover. The latent bug that
we're executing code in coroutine context that was never supposed to run
there has existed for a long time. However, the new assertion that
callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
makes it very visible because it's now always a crash.
Kevin Wolf (2):
vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
virtio: Drop out of coroutine context in virtio_load()
include/migration/vmstate.h | 8 ++++---
hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++++++++-----
2 files changed, 45 insertions(+), 8 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
2023-09-05 14:50 [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
@ 2023-09-05 14:50 ` Kevin Wolf
2023-09-05 15:06 ` Peter Xu
2023-09-05 14:50 ` [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2023-09-05 14:50 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, mst, quintela, peterx, leobras, stefanha, pbonzini,
qemu-block, qemu-stable
Migration code can run both in coroutine context (the usual case) and
non-coroutine context (at least savevm/loadvm for snapshots). This also
affects the VMState callbacks, and devices must consider this. Change
the callback definition in VMStateInfo to be explicit about it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
include/migration/vmstate.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index d1b8abe08d..e4db910339 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -41,9 +41,11 @@ typedef struct VMStateField VMStateField;
*/
struct VMStateInfo {
const char *name;
- int (*get)(QEMUFile *f, void *pv, size_t size, const VMStateField *field);
- int (*put)(QEMUFile *f, void *pv, size_t size, const VMStateField *field,
- JSONWriter *vmdesc);
+ int coroutine_mixed_fn (*get)(QEMUFile *f, void *pv, size_t size,
+ const VMStateField *field);
+ int coroutine_mixed_fn (*put)(QEMUFile *f, void *pv, size_t size,
+ const VMStateField *field,
+ JSONWriter *vmdesc);
};
enum VMStateFlags {
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()
2023-09-05 14:50 [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
2023-09-05 14:50 ` [PATCH 1/2] vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn Kevin Wolf
@ 2023-09-05 14:50 ` Kevin Wolf
2023-09-07 18:40 ` Stefan Hajnoczi
2023-10-17 5:19 ` Michael Tokarev
2023-09-07 18:42 ` [PATCH 0/2] " Stefan Hajnoczi
2023-09-08 10:46 ` Stefan Hajnoczi
3 siblings, 2 replies; 13+ messages in thread
From: Kevin Wolf @ 2023-09-05 14:50 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, mst, quintela, peterx, leobras, stefanha, pbonzini,
qemu-block, qemu-stable
virtio_load() as a whole should run in coroutine context because it
reads from the migration stream and we don't want this to block.
However, it calls virtio_set_features_nocheck() and devices don't
expect their .set_features callback to run in a coroutine and therefore
call functions that may not be called in coroutine context. To fix this,
drop out of coroutine context for calling virtio_set_features_nocheck().
Without this fix, the following crash was reported:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007efc738c05d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007efc73873d26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007efc738477f3 in __GI_abort () at abort.c:79
#4 0x00007efc7384771b in __assert_fail_base (fmt=0x7efc739dbcb8 "", assertion=assertion@entry=0x560aebfbf5cf "!qemu_in_coroutine()",
file=file@entry=0x560aebfcd2d4 "../block/graph-lock.c", line=line@entry=275, function=function@entry=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:92
#5 0x00007efc7386ccc6 in __assert_fail (assertion=0x560aebfbf5cf "!qemu_in_coroutine()", file=0x560aebfcd2d4 "../block/graph-lock.c", line=275,
function=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:101
#6 0x0000560aebcd8dd6 in bdrv_register_buf ()
#7 0x0000560aeb97ed97 in ram_block_added.llvm ()
#8 0x0000560aebb8303f in ram_block_add.llvm ()
#9 0x0000560aebb834fa in qemu_ram_alloc_internal.llvm ()
#10 0x0000560aebb2ac98 in vfio_region_mmap ()
#11 0x0000560aebb3ea0f in vfio_bars_register ()
#12 0x0000560aebb3c628 in vfio_realize ()
#13 0x0000560aeb90f0c2 in pci_qdev_realize ()
#14 0x0000560aebc40305 in device_set_realized ()
#15 0x0000560aebc48e07 in property_set_bool.llvm ()
#16 0x0000560aebc46582 in object_property_set ()
#17 0x0000560aebc4cd58 in object_property_set_qobject ()
#18 0x0000560aebc46ba7 in object_property_set_bool ()
#19 0x0000560aeb98b3ca in qdev_device_add_from_qdict ()
#20 0x0000560aebb1fbaf in virtio_net_set_features ()
#21 0x0000560aebb46b51 in virtio_set_features_nocheck ()
#22 0x0000560aebb47107 in virtio_load ()
#23 0x0000560aeb9ae7ce in vmstate_load_state ()
#24 0x0000560aeb9d2ee9 in qemu_loadvm_state_main ()
#25 0x0000560aeb9d45e1 in qemu_loadvm_state ()
#26 0x0000560aeb9bc32c in process_incoming_migration_co.llvm ()
#27 0x0000560aebeace56 in coroutine_trampoline.llvm ()
Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-832
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 40 insertions(+), 5 deletions(-)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 309038fd46..969c25f4cf 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2825,8 +2825,9 @@ static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
}
/* A wrapper for use as a VMState .get function */
-static int virtio_device_get(QEMUFile *f, void *opaque, size_t size,
- const VMStateField *field)
+static int coroutine_mixed_fn
+virtio_device_get(QEMUFile *f, void *opaque, size_t size,
+ const VMStateField *field)
{
VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
DeviceClass *dc = DEVICE_CLASS(VIRTIO_DEVICE_GET_CLASS(vdev));
@@ -2853,6 +2854,39 @@ static int virtio_set_features_nocheck(VirtIODevice *vdev, uint64_t val)
return bad ? -1 : 0;
}
+typedef struct VirtioSetFeaturesNocheckData {
+ Coroutine *co;
+ VirtIODevice *vdev;
+ uint64_t val;
+ int ret;
+} VirtioSetFeaturesNocheckData;
+
+static void virtio_set_features_nocheck_bh(void *opaque)
+{
+ VirtioSetFeaturesNocheckData *data = opaque;
+
+ data->ret = virtio_set_features_nocheck(data->vdev, data->val);
+ aio_co_wake(data->co);
+}
+
+static int coroutine_mixed_fn
+virtio_set_features_nocheck_maybe_co(VirtIODevice *vdev, uint64_t val)
+{
+ if (qemu_in_coroutine()) {
+ VirtioSetFeaturesNocheckData data = {
+ .co = qemu_coroutine_self(),
+ .vdev = vdev,
+ .val = val,
+ };
+ aio_bh_schedule_oneshot(qemu_get_current_aio_context(),
+ virtio_set_features_nocheck_bh, &data);
+ qemu_coroutine_yield();
+ return data.ret;
+ } else {
+ return virtio_set_features_nocheck(vdev, val);
+ }
+}
+
int virtio_set_features(VirtIODevice *vdev, uint64_t val)
{
int ret;
@@ -2906,7 +2940,8 @@ size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
return config_size;
}
-int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
+int coroutine_mixed_fn
+virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
{
int i, ret;
int32_t config_len;
@@ -3023,14 +3058,14 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
* host_features.
*/
uint64_t features64 = vdev->guest_features;
- if (virtio_set_features_nocheck(vdev, features64) < 0) {
+ if (virtio_set_features_nocheck_maybe_co(vdev, features64) < 0) {
error_report("Features 0x%" PRIx64 " unsupported. "
"Allowed features: 0x%" PRIx64,
features64, vdev->host_features);
return -1;
}
} else {
- if (virtio_set_features_nocheck(vdev, features) < 0) {
+ if (virtio_set_features_nocheck_maybe_co(vdev, features) < 0) {
error_report("Features 0x%x unsupported. "
"Allowed features: 0x%" PRIx64,
features, vdev->host_features);
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
2023-09-05 14:50 ` [PATCH 1/2] vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn Kevin Wolf
@ 2023-09-05 15:06 ` Peter Xu
0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2023-09-05 15:06 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, mst, quintela, leobras, stefanha, pbonzini,
qemu-block, qemu-stable
On Tue, Sep 05, 2023 at 04:50:01PM +0200, Kevin Wolf wrote:
> Migration code can run both in coroutine context (the usual case) and
> non-coroutine context (at least savevm/loadvm for snapshots). This also
> affects the VMState callbacks, and devices must consider this. Change
> the callback definition in VMStateInfo to be explicit about it.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()
2023-09-05 14:50 ` [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
@ 2023-09-07 18:40 ` Stefan Hajnoczi
2023-09-08 8:59 ` Kevin Wolf
2023-10-17 5:19 ` Michael Tokarev
1 sibling, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-09-07 18:40 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, mst, quintela, peterx, leobras, pbonzini, qemu-block,
qemu-stable
[-- Attachment #1: Type: text/plain, Size: 3308 bytes --]
On Tue, Sep 05, 2023 at 04:50:02PM +0200, Kevin Wolf wrote:
> virtio_load() as a whole should run in coroutine context because it
> reads from the migration stream and we don't want this to block.
Is that "should" a "must" or a "can"?
If it's a "must" then virtio_load() needs assert(qemu_in_coroutine()).
But the previous patch mentioned that loadvm for snapshots calls it
outside coroutine context. So maybe it's a "can"?
>
> However, it calls virtio_set_features_nocheck() and devices don't
> expect their .set_features callback to run in a coroutine and therefore
> call functions that may not be called in coroutine context. To fix this,
> drop out of coroutine context for calling virtio_set_features_nocheck().
>
> Without this fix, the following crash was reported:
>
> #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
> #1 0x00007efc738c05d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
> #2 0x00007efc73873d26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> #3 0x00007efc738477f3 in __GI_abort () at abort.c:79
> #4 0x00007efc7384771b in __assert_fail_base (fmt=0x7efc739dbcb8 "", assertion=assertion@entry=0x560aebfbf5cf "!qemu_in_coroutine()",
> file=file@entry=0x560aebfcd2d4 "../block/graph-lock.c", line=line@entry=275, function=function@entry=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:92
> #5 0x00007efc7386ccc6 in __assert_fail (assertion=0x560aebfbf5cf "!qemu_in_coroutine()", file=0x560aebfcd2d4 "../block/graph-lock.c", line=275,
> function=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:101
> #6 0x0000560aebcd8dd6 in bdrv_register_buf ()
> #7 0x0000560aeb97ed97 in ram_block_added.llvm ()
> #8 0x0000560aebb8303f in ram_block_add.llvm ()
> #9 0x0000560aebb834fa in qemu_ram_alloc_internal.llvm ()
> #10 0x0000560aebb2ac98 in vfio_region_mmap ()
> #11 0x0000560aebb3ea0f in vfio_bars_register ()
> #12 0x0000560aebb3c628 in vfio_realize ()
> #13 0x0000560aeb90f0c2 in pci_qdev_realize ()
> #14 0x0000560aebc40305 in device_set_realized ()
> #15 0x0000560aebc48e07 in property_set_bool.llvm ()
> #16 0x0000560aebc46582 in object_property_set ()
> #17 0x0000560aebc4cd58 in object_property_set_qobject ()
> #18 0x0000560aebc46ba7 in object_property_set_bool ()
> #19 0x0000560aeb98b3ca in qdev_device_add_from_qdict ()
> #20 0x0000560aebb1fbaf in virtio_net_set_features ()
> #21 0x0000560aebb46b51 in virtio_set_features_nocheck ()
> #22 0x0000560aebb47107 in virtio_load ()
> #23 0x0000560aeb9ae7ce in vmstate_load_state ()
> #24 0x0000560aeb9d2ee9 in qemu_loadvm_state_main ()
> #25 0x0000560aeb9d45e1 in qemu_loadvm_state ()
> #26 0x0000560aeb9bc32c in process_incoming_migration_co.llvm ()
> #27 0x0000560aebeace56 in coroutine_trampoline.llvm ()
>
> Cc: qemu-stable@nongnu.org
> Buglink: https://issues.redhat.com/browse/RHEL-832
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 40 insertions(+), 5 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()
2023-09-05 14:50 [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
2023-09-05 14:50 ` [PATCH 1/2] vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn Kevin Wolf
2023-09-05 14:50 ` [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
@ 2023-09-07 18:42 ` Stefan Hajnoczi
2023-09-08 8:53 ` Kevin Wolf
2023-09-08 10:46 ` Stefan Hajnoczi
3 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-09-07 18:42 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, mst, quintela, peterx, leobras, pbonzini, qemu-block,
qemu-stable
[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]
On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote:
> This fixes a recently introduced assertion failure that was reported to
> happen when migrating virtio-net with a failover. The latent bug that
> we're executing code in coroutine context that was never supposed to run
> there has existed for a long time. However, the new assertion that
> callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
> makes it very visible because it's now always a crash.
>
> Kevin Wolf (2):
> vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
> virtio: Drop out of coroutine context in virtio_load()
>
> include/migration/vmstate.h | 8 ++++---
> hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++++++++-----
> 2 files changed, 45 insertions(+), 8 deletions(-)
This looks like a bandaid for a specific instance of this problem rather
than a solution that takes care of the root cause.
Is it possible to make VMStateInfo.get/put() consistenty coroutine_fn?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()
2023-09-07 18:42 ` [PATCH 0/2] " Stefan Hajnoczi
@ 2023-09-08 8:53 ` Kevin Wolf
0 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2023-09-08 8:53 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, mst, quintela, peterx, leobras, pbonzini, qemu-block,
qemu-stable
[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]
Am 07.09.2023 um 20:42 hat Stefan Hajnoczi geschrieben:
> On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote:
> > This fixes a recently introduced assertion failure that was reported to
> > happen when migrating virtio-net with a failover. The latent bug that
> > we're executing code in coroutine context that was never supposed to run
> > there has existed for a long time. However, the new assertion that
> > callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
> > makes it very visible because it's now always a crash.
> >
> > Kevin Wolf (2):
> > vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
> > virtio: Drop out of coroutine context in virtio_load()
> >
> > include/migration/vmstate.h | 8 ++++---
> > hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++++++++-----
> > 2 files changed, 45 insertions(+), 8 deletions(-)
>
> This looks like a bandaid for a specific instance of this problem
> rather than a solution that takes care of the root cause.
>
> Is it possible to make VMStateInfo.get/put() consistenty coroutine_fn?
I think it is. Note that this doesn't solve the problem, virtio_load()
calls functions that must run _outside_ coroutine context. So once the
migration code is cleaned up to consistenly run in coroutine context,
you can remove the check and the one line for the !qemu_in_coroutine()
case from this series. The rest stays as it is.
It is not a solution that takes care of the root cause, but I also can't
think of one. The problem is that VMState callbacks both read/write the
migration stream (which should be done in coroutine context) and set the
device state (which can involve functions that must not run in coroutine
context). Untangling this, if possible at all, is not easy and certainly
not something for stable releases.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()
2023-09-07 18:40 ` Stefan Hajnoczi
@ 2023-09-08 8:59 ` Kevin Wolf
2023-09-08 10:46 ` Stefan Hajnoczi
0 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2023-09-08 8:59 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, mst, quintela, peterx, leobras, pbonzini, qemu-block,
qemu-stable
[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]
Am 07.09.2023 um 20:40 hat Stefan Hajnoczi geschrieben:
> On Tue, Sep 05, 2023 at 04:50:02PM +0200, Kevin Wolf wrote:
> > virtio_load() as a whole should run in coroutine context because it
> > reads from the migration stream and we don't want this to block.
>
> Is that "should" a "must" or a "can"?
>
> If it's a "must" then virtio_load() needs assert(qemu_in_coroutine()).
>
> But the previous patch mentioned that loadvm for snapshots calls it
> outside coroutine context. So maybe it's a "can"?
Where this makes a difference is when the function indirectly calls into
QIOChannel. When called from a coroutine, it yields while waiting for
I/O, and outside of a coroutine it blocks. Yielding is always
preferable, but in cases like HMP savevm/loadvm we also don't really
care because it's synchronous anyway.
Whether that makes it a MAY or a SHOULD in the RFC sense, you decide.
If you wanted to make it a MUST, you'd need to check all callers first
and change some of them.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()
2023-09-08 8:59 ` Kevin Wolf
@ 2023-09-08 10:46 ` Stefan Hajnoczi
0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-09-08 10:46 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, mst, quintela, peterx, leobras, pbonzini, qemu-block,
qemu-stable
[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]
On Fri, Sep 08, 2023 at 10:59:48AM +0200, Kevin Wolf wrote:
> Am 07.09.2023 um 20:40 hat Stefan Hajnoczi geschrieben:
> > On Tue, Sep 05, 2023 at 04:50:02PM +0200, Kevin Wolf wrote:
> > > virtio_load() as a whole should run in coroutine context because it
> > > reads from the migration stream and we don't want this to block.
> >
> > Is that "should" a "must" or a "can"?
> >
> > If it's a "must" then virtio_load() needs assert(qemu_in_coroutine()).
> >
> > But the previous patch mentioned that loadvm for snapshots calls it
> > outside coroutine context. So maybe it's a "can"?
>
> Where this makes a difference is when the function indirectly calls into
> QIOChannel. When called from a coroutine, it yields while waiting for
> I/O, and outside of a coroutine it blocks. Yielding is always
> preferable, but in cases like HMP savevm/loadvm we also don't really
> care because it's synchronous anyway.
>
> Whether that makes it a MAY or a SHOULD in the RFC sense, you decide.
> If you wanted to make it a MUST, you'd need to check all callers first
> and change some of them.
Thanks for clarifying. It is "can".
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()
2023-09-05 14:50 [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
` (2 preceding siblings ...)
2023-09-07 18:42 ` [PATCH 0/2] " Stefan Hajnoczi
@ 2023-09-08 10:46 ` Stefan Hajnoczi
3 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-09-08 10:46 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, mst, quintela, peterx, leobras, pbonzini, qemu-block,
qemu-stable
[-- Attachment #1: Type: text/plain, Size: 888 bytes --]
On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote:
> This fixes a recently introduced assertion failure that was reported to
> happen when migrating virtio-net with a failover. The latent bug that
> we're executing code in coroutine context that was never supposed to run
> there has existed for a long time. However, the new assertion that
> callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
> makes it very visible because it's now always a crash.
>
> Kevin Wolf (2):
> vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
> virtio: Drop out of coroutine context in virtio_load()
>
> include/migration/vmstate.h | 8 ++++---
> hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++++++++-----
> 2 files changed, 45 insertions(+), 8 deletions(-)
>
> --
> 2.41.0
>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()
2023-09-05 14:50 ` [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
2023-09-07 18:40 ` Stefan Hajnoczi
@ 2023-10-17 5:19 ` Michael Tokarev
2023-10-17 8:48 ` Kevin Wolf
2023-10-17 13:06 ` Juan Quintela
1 sibling, 2 replies; 13+ messages in thread
From: Michael Tokarev @ 2023-10-17 5:19 UTC (permalink / raw)
To: Kevin Wolf, qemu-devel
Cc: mst, quintela, peterx, leobras, stefanha, pbonzini, qemu-block,
qemu-stable
05.09.2023 17:50, Kevin Wolf wrote:
> virtio_load() as a whole should run in coroutine context because it
> reads from the migration stream and we don't want this to block.
>
> However, it calls virtio_set_features_nocheck() and devices don't
> expect their .set_features callback to run in a coroutine and therefore
> call functions that may not be called in coroutine context. To fix this,
> drop out of coroutine context for calling virtio_set_features_nocheck().
...
> Cc: qemu-stable@nongnu.org
> Buglink: https://issues.redhat.com/browse/RHEL-832
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It looks like this change caused an interesting regression,
https://gitlab.com/qemu-project/qemu/-/issues/1933
at least in -stable. Can you take a look please?
BTW, Kevin, do you have account @gitlab?
Thanks,
/mjt
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()
2023-10-17 5:19 ` Michael Tokarev
@ 2023-10-17 8:48 ` Kevin Wolf
2023-10-17 13:06 ` Juan Quintela
1 sibling, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2023-10-17 8:48 UTC (permalink / raw)
To: Michael Tokarev
Cc: qemu-devel, mst, quintela, peterx, leobras, stefanha, pbonzini,
qemu-block, qemu-stable
Am 17.10.2023 um 07:19 hat Michael Tokarev geschrieben:
> 05.09.2023 17:50, Kevin Wolf wrote:
> > virtio_load() as a whole should run in coroutine context because it
> > reads from the migration stream and we don't want this to block.
> >
> > However, it calls virtio_set_features_nocheck() and devices don't
> > expect their .set_features callback to run in a coroutine and therefore
> > call functions that may not be called in coroutine context. To fix this,
> > drop out of coroutine context for calling virtio_set_features_nocheck().
> ...
> > Cc: qemu-stable@nongnu.org
> > Buglink: https://issues.redhat.com/browse/RHEL-832
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>
> It looks like this change caused an interesting regression,
> https://gitlab.com/qemu-project/qemu/-/issues/1933
> at least in -stable. Can you take a look please?
Huh?! This is an interesting one indeed.
I can't see any direct connection between the patch and this regression.
Random memory corruption is the only explanation I have. But I'm not
sure how this patch could cause it, it's quite simple.
The next step is probably trying to find a simple reproducer on the QEMU
level. And then maybe valgrind or we could get stack traces for the
call to virtio_set_features_nocheck_maybe_co(). Also the stack trace for
the crash and maybe the content of 's' would be interesting - we can ask
the reporter for that, the core dump should be enough for that.
Another potentially interesting question is whether after yielding, the
coroutine is indeed reentered from the aio_co_wake() call in the patch
or if something else wakes it up. If it were the latter, that could
explain memory corruption.
> BTW, Kevin, do you have account @gitlab?
Yes, @kmwolf.
Kevin
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()
2023-10-17 5:19 ` Michael Tokarev
2023-10-17 8:48 ` Kevin Wolf
@ 2023-10-17 13:06 ` Juan Quintela
1 sibling, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2023-10-17 13:06 UTC (permalink / raw)
To: Michael Tokarev
Cc: Kevin Wolf, qemu-devel, mst, peterx, leobras, stefanha, pbonzini,
qemu-block, qemu-stable
Michael Tokarev <mjt@tls.msk.ru> wrote:
> 05.09.2023 17:50, Kevin Wolf wrote:
>> virtio_load() as a whole should run in coroutine context because it
>> reads from the migration stream and we don't want this to block.
>> However, it calls virtio_set_features_nocheck() and devices don't
>> expect their .set_features callback to run in a coroutine and therefore
>> call functions that may not be called in coroutine context. To fix this,
>> drop out of coroutine context for calling virtio_set_features_nocheck().
> ...
>> Cc: qemu-stable@nongnu.org
>> Buglink: https://issues.redhat.com/browse/RHEL-832
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>
> It looks like this change caused an interesting regression,
> https://gitlab.com/qemu-project/qemu/-/issues/1933
> at least in -stable. Can you take a look please?
>
> BTW, Kevin, do you have account @gitlab?
Dunno what is going on here, but failing postcopy is weird.
2023-10-12T06:23:44.354387Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2892749 kHz) and host (2799999 kHz), and TSC scaling unavailable 2023-10-12T06:23:44.354538Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2892749 kHz) and host (2799999 kHz), and TSC scaling unavailable
I hope/guess that the problem is not TSC related?
i.e. does other tests work between this two machines?
Once discarding that, we get on source:
2023-10-12 06:23:43.412+0000: initiating migration
2023-10-12T06:23:44.362392Z qemu-system-x86_64: failed to save
SaveStateEntry with id(name): 3(ram): -5
So migration was aborted, and -5 is EIO on my system.
So we are having trouble here with a write() somewhere.
Later, Juan.
> Thanks,
>
> /mjt
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-10-17 13:07 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-05 14:50 [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
2023-09-05 14:50 ` [PATCH 1/2] vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn Kevin Wolf
2023-09-05 15:06 ` Peter Xu
2023-09-05 14:50 ` [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load() Kevin Wolf
2023-09-07 18:40 ` Stefan Hajnoczi
2023-09-08 8:59 ` Kevin Wolf
2023-09-08 10:46 ` Stefan Hajnoczi
2023-10-17 5:19 ` Michael Tokarev
2023-10-17 8:48 ` Kevin Wolf
2023-10-17 13:06 ` Juan Quintela
2023-09-07 18:42 ` [PATCH 0/2] " Stefan Hajnoczi
2023-09-08 8:53 ` Kevin Wolf
2023-09-08 10:46 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).