From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:51881) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1glbve-0003KM-1s for qemu-devel@nongnu.org; Mon, 21 Jan 2019 10:56:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1glbvc-0004DR-1K for qemu-devel@nongnu.org; Mon, 21 Jan 2019 10:56:01 -0500 Received: from mx1.redhat.com ([209.132.183.28]:32862) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1glbva-0004By-Vk for qemu-devel@nongnu.org; Mon, 21 Jan 2019 10:55:59 -0500 Date: Mon, 21 Jan 2019 16:55:53 +0100 From: Kevin Wolf Message-ID: <20190121155553.GD5638@linux.fritz.box> References: <20190114105132.GA2524@work-vm> <20190114115205.GD6837@linux.fritz.box> <20190118155703.GF2146@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190118155703.GF2146@work-vm> Subject: Re: [Qemu-devel] 3.1: second invocation of migrate crashes qemu List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Michael Tokarev , quintela@redhat.com, qemu-devel Am 18.01.2019 um 16:57 hat Dr. David Alan Gilbert geschrieben: > * Kevin Wolf (kwolf@redhat.com) wrote: > > Am 14.01.2019 um 11:51 hat Dr. David Alan Gilbert geschrieben: > > > * Michael Tokarev (mjt@tls.msk.ru) wrote: > > > > $ qemu-system-x86_64 -monitor stdio -hda foo.img > > > > QEMU 3.1.0 monitor - type 'help' for more information > > > > (qemu) stop > > > > (qemu) migrate "exec:cat >/dev/null" > > > > (qemu) migrate "exec:cat >/dev/null" > > > > qemu-system-x86_64: /build/qemu/qemu-3.1/block.c:4647: bdrv_inactivate_recurse: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. > > > > Aborted > > > > > > And on head as well; it only happens if the 1st migrate is succesful; > > > if it got cancelled the 2nd one works, so it's not too bad. > > > > > > I suspect the problem here is all around locking/ownership - the block > > > devices get shutdown at the end of migration since the assumption is > > > that the other end has them open now and we had better release them. > > > > Yes, only "cont" gets control back to the source VM. > > > > I think we really should limit the possible monitor commands in the > > postmigrate status, and possibly provide a way to get back to the > > regular paused state (which means getting back control of the resources) > > without resuming the VM first. > > This error is a little interesting if you'd done something like: > > > src: > stop > migrate > > dst: > > start a new qemu > > src: > migrate > > Now that used to work (safely) - note we've not started > a VM succesfully anywhere else. > > Now the source refuses to let that happen - with a rather > nasty abort. Essentially it's another effect of the problem that migration has always lacked a proper model of ownership transfer. And it's still treating this as a block layer problem rather than making it a core concept of migration as it should. We can stack another one-off fix on top, and get back control of the block devices automatically on a second 'migrate'. But it feels like a hack and not like VMs had a properly designed and respected state machine. Kevin