qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Max Reitz <mreitz@redhat.com>, Qemu-block <qemu-block@nongnu.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	"eblake@redhat.com" <eblake@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] Aborts in iotest 169
Date: Thu, 24 Jan 2019 11:45:01 +0100	[thread overview]
Message-ID: <20190124104501.GE4601@localhost.localdomain> (raw)
In-Reply-To: <5310ca7f-d76d-0ff6-804d-2d6269243497@virtuozzo.com>

Am 24.01.2019 um 11:32 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 24.01.2019 13:15, Kevin Wolf wrote:
> > Am 24.01.2019 um 10:29 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 23.01.2019 18:48, Max Reitz wrote:
> >>> Hi,
> >>>
> >>> When running 169 in parallel (e.g. like so:
> >>>
> >>> $ while TEST_DIR=/tmp/t0 ./check -T -qcow2 169; do; done
> >>> $ while TEST_DIR=/tmp/t1 ./check -T -qcow2 169; do; done
> >>> $ while TEST_DIR=/tmp/t2 ./check -T -qcow2 169; do; done
> >>> $ while TEST_DIR=/tmp/t3 ./check -T -qcow2 169; do; done
> >>>
> >>> in four different shells), I get aborts:
> >>>
> >>> (Often I get segfaults, but that's because of
> >>> http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html --
> >>> feel free to apply the attached patch to make them go away)
> >>>
> >>>
> >>> WARNING:qemu:qemu received signal 6:
> >>> build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
> >>> -chardev socket,id=mon,path=/tmp/t0/tmpbX30XU/qemua-25745-monitor.sock
> >>> -mon chardev=mon,mode=control -display none -vga none -qtest
> >>> unix:path=/tmp/t0/qemua-25745-qtest.sock -machine accel=qtest
> >>> -nodefaults -machine accel=qtest -drive
> >>> if=virtio,id=drive0,file=/tmp/t0/disk_a,format=qcow2,cache=writeback
> >>> .................E..
> >>> ======================================================================
> >>> ERROR:
> >>> test_do_test_migration_resume_source_not_persistent__not_migbitmap
> >>> (__main__.TestDirtyBitmapMigration)
> >>> ----------------------------------------------------------------------
> >>> Traceback (most recent call last):
> >>>     File "169", line 206, in <lambda>
> >>>       setattr(klass, 'test_' + method + name, lambda self: mc(self))
> >>>     File "169", line 113, in do_test_migration_resume_source
> >>>       self.check_bitmap(self.vm_a, sha256)
> >>>     File "169", line 72, in check_bitmap
> >>>       node='drive0', name='bitmap0')
> >>>     File "tests/qemu-iotests/../../scripts/qemu.py", line 369, in qmp
> >>>       return self._qmp.cmd(cmd, args=qmp_args)
> >>>     File "tests/qemu-iotests/../../scripts/qmp/qmp.py", line 191, in cmd
> >>>       return self.cmd_obj(qmp_cmd)
> >>>     File "tests/qemu-iotests/../../scripts/qmp/qmp.py", line 174, in cmd_obj
> >>>       resp = self.__json_read()
> >>>     File "tests/qemu-iotests/../../scripts/qmp/qmp.py", line 82, in
> >>> __json_read
> >>>       data = self.__sockfile.readline()
> >>>     File "/usr/lib64/python2.7/socket.py", line 451, in readline
> >>>       data = self._sock.recv(self._rbufsize)
> >>> error: [Errno 104] Connection reset by peer
> >>>
> >>> ----------------------------------------------------------------------
> >>> Ran 20 tests
> >>>
> >>> FAILED (errors=1)
> >>>
> >>>
> >>> Or:
> >>>
> >>> WARNING:qemu:qemu received signal 6:
> >>> build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
> >>> -chardev socket,id=mon,path=/tmp/t3/tmp0pllWD/qemua-3445-monitor.sock
> >>> -mon chardev=mon,mode=control -display none -vga none -qtest
> >>> unix:path=/tmp/t3/qemua-3445-qtest.sock -machine accel=qtest -nodefaults
> >>> -machine accel=qtest -drive
> >>> if=virtio,id=drive0,file=/tmp/t3/disk_a,format=qcow2,cache=writeback
> >>> WARNING:qemu:qemu received signal 6:
> >>> build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
> >>> -chardev socket,id=mon,path=/tmp/t3/tmp0pllWD/qemua-3445-monitor.sock
> >>> -mon chardev=mon,mode=control -display none -vga none -qtest
> >>> unix:path=/tmp/t3/qemua-3445-qtest.sock -machine accel=qtest -nodefaults
> >>> -machine accel=qtest -drive
> >>> if=virtio,id=drive0,file=/tmp/t3/disk_a,format=qcow2,cache=writeback
> >>>
> >>> ...................F
> >>> ======================================================================
> >>> FAIL: test_do_test_migration_resume_source_persistent__not_migbitmap
> >>> (__main__.TestDirtyBitmapMigration)
> >>> ----------------------------------------------------------------------
> >>> Traceback (most recent call last):
> >>>     File "169", line 206, in <lambda>
> >>>       setattr(klass, 'test_' + method + name, lambda self: mc(self))
> >>>     File "169", line 125, in do_test_migration_resume_source
> >>>       self.assertEqual(log, '')
> >>> AssertionError: "qemu-system-x86_64: invalid runstate transition:
> >>> 'running' -> 'postmigrate'\n" != ''
> >>>
> >>> ----------------------------------------------------------------------
> >>> Ran 20 tests
> >>>
> >>> FAILED (failures=1)
> >>>
> >>>
> >>> The backtrace always goes like this:
> >>>
> >>> (gdb) bt
> >>> #0  0x00007f0acf5cc53f in raise () at /lib64/libc.so.6
> >>> #1  0x00007f0acf5b6895 in abort () at /lib64/libc.so.6
> >>> #2  0x000055a46ebbb1a6 in runstate_set (new_state=RUN_STATE_POSTMIGRATE)
> >>> at vl.c:742
> >>> #3  0x000055a46ebbb1a6 in runstate_set
> >>> (new_state=new_state@entry=RUN_STATE_POSTMIGRATE) at vl.c:730
> >>> #4  0x000055a46ed39129 in migration_iteration_finish (s=0x55a4708be000)
> >>> at migration/migration.c:2972
> >>> #5  0x000055a46ed39129 in migration_thread
> >>> (opaque=opaque@entry=0x55a4708be000) at migration/migration.c:3130
> >>> #6  0x000055a46eea665a in qemu_thread_start (args=<optimized out>) at
> >>> util/qemu-thread-posix.c:502
> >>>
> >>>
> >>> #7  0x00007f0acf76258e in start_thread () at /lib64/libpthread.so.0
> >>> #8  0x00007f0acf6916a3 in clone () at /lib64/libc.so.6
> >>> (gdb) frame 2
> >>> #2  0x000055a46ebbb1a6 in runstate_set (new_state=RUN_STATE_POSTMIGRATE)
> >>> at vl.c:742
> >>> 742             abort();
> >>> (gdb) print current_run_state
> >>> $1 = RUN_STATE_RUNNING
> >>>
> >>>
> >>> Neither of migration or runstates are my strong suite, so I thought I'd
> >>> report it before diving into it.
> >>>
> >>> Max
> >>>
> >>
> >> [...]
> >> 450556@1548321072.888979:migrate_set_state new state active
> >> 450556@1548321072.889022:migration_thread_setup_complete
> >> 450556@1548321072.988275:migrate_transferred transferred 985298 time_spent 100 bandwidth 9852 max_size 2955894
> >> 450556@1548321072.988283:migration_bitmap_sync_start
> >> 450556@1548321072.988293:migration_bitmap_sync_end dirty_pages 32898
> >> 450556@1548321072.988495:migration_thread_low_pending 2048
> >> migration_completion
> >> 450556@1548321072.988541:runstate_set current_state 9 new_state 7
> >> 450556@1548321072.988780:migration_bitmap_sync_start
> >> 450556@1548321072.988790:migration_bitmap_sync_end dirty_pages 32898
> >> 450556@1548321072.993112:migrate_global_state_pre_save saved state: running
> >> [1] 450556@1548321072.993237:migrate_set_state new state completed
> >> 450549@1548321072.993668:handle_qmp_command mon 0x55aebef99830 req: {"execute": "x-debug-block-dirty-bitmap-sha256", "arguments": {"name": "bitmap0", "node": "drive0"}}
> >> 450556@1548321072.993697:migration_thread_after_loop
> >> [2] 450549@1548321072.993917:handle_qmp_command mon 0x55aebef99830 req: {"execute": "cont"}
> >> 450476@1548321072.994461:runstate_set current_state 7 new_state 9
> >> qemu-system-x86_64: invalid runstate transition: 'running' -> 'postmigrate'
> >>
> >>
> >> Aha, so, in test we wait for MIGRATION COMPLETED [1] and run qmp_cont [2] and go to RUNNING.
> >> then in migration thread we go to migration_iteration_finish() and fail to go to POSTMIGRATE.
> >> (note, that this testcase is about resuming source after migration)
> >>
> >> So, fix for test may be additionally wait for POSTMIGRATE, not only for MIGRATION COMPLETION.
> >>
> >> But how to fix Qemu not to crash? May be, forbid some transitions (FINISH_MIGRATE -> RUNNING),
> >>    or at least error-out qmp_cont if runstate is  FINISH_MIGRATE?
> > 
> > qmp_cont currently checks for RUN_STATE_INMIGRATE and gives it special
> > treatment. As a quick hack, doing the same for RUN_STATE_FINISH_MIGRATE
> > might fix the problem.
> > A better simple fix would possibly include a
> > whitelist of states where you can use 'cont' to avoid that the user
> > messes with other internal states.
> > 
> > However, like the problem that Dave reported a few days ago (migrating
> > twice leads to a crash because we try to inactivate already inactive
> > block devices), I think this is a symptom of a larger problem. We're
> > not taking the state machine serious enough. One-off checks here and
> > there are unlikely to ever fully solve the problem of running a command
> > in a runstate in which it was never supposed to run.
> > 
> > I wonder whether the QAPI schema should have a field 'run-states' for
> > commands, and by default we would only include states where the VM has
> > ownership of its resources (e.g. images are activated) and which are not
> > temporary states that are automatically left, like finish-migrate.
> > 
> > Then the default for commands is to be rejected in "unusual" runstates
> > where we're not expecting user intervention, and we must explicitly
> > allow them if they are okay, in fact.
> > 
> > Instead of listing every obscure runstate, maybe we should really use
> > categories of runstates instead:
> > 
> > 1. Running
> > 2. Paused, owns all resources (like disk images)
> > 3. Paused, doesn't own some resources (source VM after migration
> >     completes, destination before migration completes)
> > 4. Paused temporarily for internal reasons (e.g. finish-migrate,
> >     restore-vm, save-vm)
> > 
> > Most commands should be okay with 1 and 2, but possibly not 3, and
> > almost never 4.
> 
> And then we need a mechanism to block state transition while some
> operations which don't support new state are ongoing.

This essentially means not allowing those QMP commands that could cause
a state transition (primarily cont/stop, but probably some more).

State transitions that happen automatically should in theory always be
correct, everything else is a bug and abort() is the right way to stop
them (like we already do).

Kevin

  reply	other threads:[~2019-01-24 10:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 15:48 [Qemu-devel] Aborts in iotest 169 Max Reitz
2019-01-23 16:04 ` Luiz Capitulino
2019-01-23 16:12   ` Max Reitz
2019-01-23 16:24     ` Luiz Capitulino
2019-01-23 16:35       ` Dr. David Alan Gilbert
2019-01-23 17:16         ` Max Reitz
2019-01-23 18:08           ` Dr. David Alan Gilbert
2019-01-24  8:08             ` Vladimir Sementsov-Ogievskiy
2019-01-24  9:29 ` Vladimir Sementsov-Ogievskiy
2019-01-24  9:49   ` Vladimir Sementsov-Ogievskiy
2019-01-24 10:10     ` Dr. David Alan Gilbert
2019-01-24 10:23       ` Vladimir Sementsov-Ogievskiy
2019-01-24 10:52         ` Dr. David Alan Gilbert
2019-01-24 10:15   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2019-01-24 10:32     ` Vladimir Sementsov-Ogievskiy
2019-01-24 10:45       ` Kevin Wolf [this message]
2019-01-24 10:49     ` Dr. David Alan Gilbert
2019-01-24 11:58       ` Kevin Wolf
2019-01-24 20:15         ` Dr. David Alan Gilbert
2019-01-24 11:53     ` Vladimir Sementsov-Ogievskiy
2019-01-24 17:58     ` Eric Blake
2019-04-08 14:57 ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2019-04-08 14:57   ` Vladimir Sementsov-Ogievskiy
2019-04-08 15:08   ` Max Reitz
2019-04-08 15:08     ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190124104501.GE4601@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).