From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48002) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCmpH-0002DS-SL for qemu-devel@nongnu.org; Wed, 08 Jul 2015 06:43:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZCmpE-0007Yf-JA for qemu-devel@nongnu.org; Wed, 08 Jul 2015 06:43:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34292) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCmpE-0007YP-Cj for qemu-devel@nongnu.org; Wed, 08 Jul 2015 06:43:36 -0400 Date: Wed, 8 Jul 2015 11:43:27 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150708104326.GE2463@work-vm> References: <1436274549-28826-1-git-send-email-quintela@redhat.com> <1436274549-28826-16-git-send-email-quintela@redhat.com> <559CF767.3060000@de.ibm.com> <20150708101415.GD2463@work-vm> <559CFD2D.2040806@de.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <559CFD2D.2040806@de.ibm.com> Subject: Re: [Qemu-devel] [PULL 15/28] migration: create new section to store global state List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christian Borntraeger Cc: amit.shah@redhat.com, Cornelia Huck , qemu-devel@nongnu.org, Juan Quintela * Christian Borntraeger (borntraeger@de.ibm.com) wrote: > Am 08.07.2015 um 12:14 schrieb Dr. David Alan Gilbert: > > * Christian Borntraeger (borntraeger@de.ibm.com) wrote: > >> Am 07.07.2015 um 15:08 schrieb Juan Quintela: > >>> This includes a new section that for now just stores the current qemu state. > >>> > >>> Right now, there are only one way to control what is the state of the > >>> target after migration. > >>> > >>> - If you run the target qemu with -S, it would start stopped. > >>> - If you run the target qemu without -S, it would run just after migration finishes. > >>> > >>> The problem here is what happens if we start the target without -S and > >>> there happens one error during migration that puts current state as > >>> -EIO. Migration would ends (notice that the error happend doing block > >>> IO, network IO, i.e. nothing related with migration), and when > >>> migration finish, we would just "continue" running on destination, > >>> probably hanging the guest/corruption data, whatever. > >>> > >>> Signed-off-by: Juan Quintela > >>> Reviewed-by: Dr. David Alan Gilbert > >> > >> This is bisected to cause a regression on s390. > >> > >> A guest restarts (booting) after managedsave/start instead of continuing. > >> > >> Do you have any idea what might be wrong? > > > > I'd add some debug to the pre_save and post_load to see what state value is > > being saved/restored. > > > > Also, does that regression happen when doing the save/restore using the same/latest > > git, or is it a load from an older version? > > Seems to happen only with some guest definitions, but I cant really pinpoint it yet. > e.g. removing queues='4' from my network card solved it for a reduced xml, but > doing the same on a bigger xml was not enough :-/ Nasty; Still the 'paused' value in the pre-save/post-load feels right. I've read through the patch again and it still fells right to me, so I don't see anything obvious. Perhaps it's worth turning on the migration tracing on both sides and seeing what's different with that 'queues=4' ? Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK