* [Qemu-devel] [PATCH] qemu-iotests: fix 203 migration completion race
@ 2018-03-05 15:59 Stefan Hajnoczi
2018-03-05 16:04 ` Max Reitz
0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2018-03-05 15:59 UTC (permalink / raw)
To: qemu-devel; +Cc: Max Reitz, qemu-block, Kevin Wolf, Stefan Hajnoczi
There is a race between the test's 'query-migrate' QMP command after the
QMP 'STOP' event and completing the migration:
The test case invokes 'query-migrate' upon receiving 'STOP'. At this
point the migration thread may still be in the process of completing.
Therefore 'query-migrate' can return 'status': 'active' for a brief
window of time instead of 'status': 'completed'. This results in
qemu-iotests 203 hanging.
Solve the race by enabling the 'events' migration capability, which
causes QEMU to emit migration-specific QMP events that do not suffer
from this race condition. Wait for the QMP 'MIGRATION' event with
'status': 'completed'.
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
tests/qemu-iotests/203 | 15 +++++++++++----
tests/qemu-iotests/203.out | 5 +++++
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/tests/qemu-iotests/203 b/tests/qemu-iotests/203
index 2c811917d8..4874a1a0d8 100755
--- a/tests/qemu-iotests/203
+++ b/tests/qemu-iotests/203
@@ -49,11 +49,18 @@ with iotests.FilePath('disk0.img') as disk0_img_path, \
node_name='drive1-node', iothread='iothread0',
force=True))
+ iotests.log('Enabling migration QMP events...')
+ iotests.log(vm.qmp('migrate-set-capabilities', capabilities=[
+ {
+ 'capability': 'events',
+ 'state': True
+ }
+ ]))
+
iotests.log('Starting migration...')
iotests.log(vm.qmp('migrate', uri='exec:cat >/dev/null'))
while True:
- vm.get_qmp_event(wait=60.0)
- result = vm.qmp('query-migrate')
- status = result.get('return', {}).get('status', None)
- if status == 'completed':
+ event = vm.event_wait('MIGRATION')
+ iotests.log(event, filters=[iotests.filter_qmp_event])
+ if event['data']['status'] == 'completed':
break
diff --git a/tests/qemu-iotests/203.out b/tests/qemu-iotests/203.out
index 3f1ff900e4..1a11f0975c 100644
--- a/tests/qemu-iotests/203.out
+++ b/tests/qemu-iotests/203.out
@@ -2,5 +2,10 @@ Launching VM...
Setting IOThreads...
{u'return': {}}
{u'return': {}}
+Enabling migration QMP events...
+{u'return': {}}
Starting migration...
{u'return': {}}
+{u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'setup'}, u'event': u'MIGRATION'}
+{u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'active'}, u'event': u'MIGRATION'}
+{u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'completed'}, u'event': u'MIGRATION'}
--
2.14.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] qemu-iotests: fix 203 migration completion race
2018-03-05 15:59 [Qemu-devel] [PATCH] qemu-iotests: fix 203 migration completion race Stefan Hajnoczi
@ 2018-03-05 16:04 ` Max Reitz
2018-03-06 16:18 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
0 siblings, 1 reply; 4+ messages in thread
From: Max Reitz @ 2018-03-05 16:04 UTC (permalink / raw)
To: Stefan Hajnoczi, qemu-devel; +Cc: qemu-block, Kevin Wolf
[-- Attachment #1: Type: text/plain, Size: 1093 bytes --]
On 2018-03-05 16:59, Stefan Hajnoczi wrote:
> There is a race between the test's 'query-migrate' QMP command after the
> QMP 'STOP' event and completing the migration:
>
> The test case invokes 'query-migrate' upon receiving 'STOP'. At this
> point the migration thread may still be in the process of completing.
> Therefore 'query-migrate' can return 'status': 'active' for a brief
> window of time instead of 'status': 'completed'. This results in
> qemu-iotests 203 hanging.
>
> Solve the race by enabling the 'events' migration capability, which
> causes QEMU to emit migration-specific QMP events that do not suffer
> from this race condition. Wait for the QMP 'MIGRATION' event with
> 'status': 'completed'.
>
> Reported-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tests/qemu-iotests/203 | 15 +++++++++++----
> tests/qemu-iotests/203.out | 5 +++++
> 2 files changed, 16 insertions(+), 4 deletions(-)
So much for "the ppoll() dungeon"...
Thanks!
Reviewed-by: Max Reitz <mreitz@redhat.com>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race
2018-03-05 16:04 ` Max Reitz
@ 2018-03-06 16:18 ` Stefan Hajnoczi
2018-03-07 18:01 ` Max Reitz
0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2018-03-06 16:18 UTC (permalink / raw)
To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block
[-- Attachment #1: Type: text/plain, Size: 2491 bytes --]
On Mon, Mar 05, 2018 at 05:04:52PM +0100, Max Reitz wrote:
> On 2018-03-05 16:59, Stefan Hajnoczi wrote:
> > There is a race between the test's 'query-migrate' QMP command after the
> > QMP 'STOP' event and completing the migration:
> >
> > The test case invokes 'query-migrate' upon receiving 'STOP'. At this
> > point the migration thread may still be in the process of completing.
> > Therefore 'query-migrate' can return 'status': 'active' for a brief
> > window of time instead of 'status': 'completed'. This results in
> > qemu-iotests 203 hanging.
> >
> > Solve the race by enabling the 'events' migration capability, which
> > causes QEMU to emit migration-specific QMP events that do not suffer
> > from this race condition. Wait for the QMP 'MIGRATION' event with
> > 'status': 'completed'.
> >
> > Reported-by: Max Reitz <mreitz@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > tests/qemu-iotests/203 | 15 +++++++++++----
> > tests/qemu-iotests/203.out | 5 +++++
> > 2 files changed, 16 insertions(+), 4 deletions(-)
>
> So much for "the ppoll() dungeon"...
It was still a pain to debug :).
I put a ring buffer into the QMP monitor input/output code. Then it was
possible to figure out the issue via GDB on a hung QEMU:
(gdb) p current_run_state
RUN_STATE_POSTMIGRATE
(gdb) p current_migration->status
MIGRATION_STATUS_COMPLETED
(gdb) p monitor_out_ring
...'STOP' event...
(gdb) p monitor_in_ring
...query-migrate... <-- okay, the test checked if migration finished
Then looking at the code:
static void migration_completion(MigrationState *s)
{
...
if (s->state == MIGRATION_STATUS_ACTIVE) {
qemu_mutex_lock_iothread();
s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
s->vm_was_running = runstate_is_running();
ret = global_state_store();
if (!ret) {
bool inactivate = !migrate_colo_enabled();
v---- The stop event comes from here
ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
...
}
qemu_mutex_unlock_iothread(); <--- oh, no!
...
if (!migrate_colo_enabled()) {
migrate_set_state(&s->state, current_active_state,
MIGRATION_STATUS_COMPLETED); <-- too late!
}
return;
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race
2018-03-06 16:18 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-03-07 18:01 ` Max Reitz
0 siblings, 0 replies; 4+ messages in thread
From: Max Reitz @ 2018-03-07 18:01 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block
[-- Attachment #1: Type: text/plain, Size: 2949 bytes --]
On 2018-03-06 17:18, Stefan Hajnoczi wrote:
> On Mon, Mar 05, 2018 at 05:04:52PM +0100, Max Reitz wrote:
>> On 2018-03-05 16:59, Stefan Hajnoczi wrote:
>>> There is a race between the test's 'query-migrate' QMP command after the
>>> QMP 'STOP' event and completing the migration:
>>>
>>> The test case invokes 'query-migrate' upon receiving 'STOP'. At this
>>> point the migration thread may still be in the process of completing.
>>> Therefore 'query-migrate' can return 'status': 'active' for a brief
>>> window of time instead of 'status': 'completed'. This results in
>>> qemu-iotests 203 hanging.
>>>
>>> Solve the race by enabling the 'events' migration capability, which
>>> causes QEMU to emit migration-specific QMP events that do not suffer
>>> from this race condition. Wait for the QMP 'MIGRATION' event with
>>> 'status': 'completed'.
>>>
>>> Reported-by: Max Reitz <mreitz@redhat.com>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>> tests/qemu-iotests/203 | 15 +++++++++++----
>>> tests/qemu-iotests/203.out | 5 +++++
>>> 2 files changed, 16 insertions(+), 4 deletions(-)
>>
>> So much for "the ppoll() dungeon"...
>
> It was still a pain to debug :).
>
> I put a ring buffer into the QMP monitor input/output code.
Oh, wow.
> Then it was
> possible to figure out the issue via GDB on a hung QEMU:
>
> (gdb) p current_run_state
> RUN_STATE_POSTMIGRATE
> (gdb) p current_migration->status
> MIGRATION_STATUS_COMPLETED
> (gdb) p monitor_out_ring
> ...'STOP' event...
> (gdb) p monitor_in_ring
> ...query-migrate... <-- okay, the test checked if migration finished
>
> Then looking at the code:
>
> static void migration_completion(MigrationState *s)
> {
> ...
> if (s->state == MIGRATION_STATUS_ACTIVE) {
> qemu_mutex_lock_iothread();
> s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> s->vm_was_running = runstate_is_running();
> ret = global_state_store();
>
> if (!ret) {
> bool inactivate = !migrate_colo_enabled();
>
> v---- The stop event comes from here
> ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> ...
> }
> qemu_mutex_unlock_iothread(); <--- oh, no!
> ...
> if (!migrate_colo_enabled()) {
> migrate_set_state(&s->state, current_active_state,
> MIGRATION_STATUS_COMPLETED); <-- too late!
> }
>
> return;
OK... I guess the answer to this just is "the stop event doesn't mean
anything, use the migration events instead" (i.e. what your patch does).
Thanks a lot, applied to my block branch:
https://github.com/XanClic/qemu/commits/block
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-03-07 18:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-05 15:59 [Qemu-devel] [PATCH] qemu-iotests: fix 203 migration completion race Stefan Hajnoczi
2018-03-05 16:04 ` Max Reitz
2018-03-06 16:18 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-03-07 18:01 ` Max Reitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).