From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=56752 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGwi6-0001WW-Da for qemu-devel@nongnu.org; Tue, 25 May 2010 12:10:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OGwi3-0001h6-Pk for qemu-devel@nongnu.org; Tue, 25 May 2010 12:10:29 -0400 Received: from mail-gx0-f218.google.com ([209.85.217.218]:54379) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGwi3-0001gy-M3 for qemu-devel@nongnu.org; Tue, 25 May 2010 12:10:27 -0400 Received: by gxk10 with SMTP id 10so1632584gxk.10 for ; Tue, 25 May 2010 09:10:27 -0700 (PDT) Message-ID: <4BFBF66F.3030702@codemonkey.ws> Date: Tue, 25 May 2010 11:10:23 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <9b6575587d22a5c85ec536172810520ee3b945d5.1274796992.git.quintela@redhat.com> <4BFBE843.5070202@codemonkey.ws> <4BFBF36D.8070208@codemonkey.ws> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: Luiz Capitulino , qemu-devel@nongnu.org, Markus Armbruster On 05/25/2010 11:04 AM, Juan Quintela wrote: > Anthony Liguori wrote: > >> On 05/25/2010 10:35 AM, Juan Quintela wrote: >> > >>> problem here is that libvirt start target with -S, and waits to do the >>> "cont" as soon as possible. As of know, only way to do it is to poll >>> info migrate on source faster. >>> >>> >> Why does it do that?? >> >> That sound like a terrible idea. >> > Becaues migration is not reliable, and they don't have a way to issue > cont only in one of the sides :( > I don't know what you mean by reliable. When the migration completes on the destination, it will start automatically. The source will not start unless explicitly invoked. If you successfully cancel a migration on the source, it's guaranteed that it won't start on the destination. So the sequence looks like: src) // decide we want to give up migration src) migrate_cancel src) // check migration status src) cont // if migration cancelled src) //if migration succeeded, check destination for completion dst) // if not responsive and not completed in appropriate amount of time, kill guest src) cont // if killed destination I don't see what the problem is. > We make migration protocol reliable, or management application have to > decide when migration suceeded or not. > Reliability has nothing to do with the protocol and everything to do with the presence of the third node. > This new events help then a lot. But they issue the cont really fast > (before migration ends). I don't remember why they did that. > If libvirt is launching the destination with -S, it's doing the wrong thing and we ought make sure the proper fix gets implemented. > danp? > > >>>> There should be some information about why it failed, no? Preferrably >>>> in a QError format. >>>> >>>> >>> At this point, we have basically -1 :( >>> >>> I can add a field with an error number, but we are very bad at the >>> moment about moving errno's upstack. >>> >>> >> We need a better solution for reporting errors via notifications. >> > Suggestions? > > Notice that what we need now is a way to know if migration ended with > success or in any other way, as soon as possible. > Markus/Luiz? >>>> I think this makes more sense as a MIGRATION_CONNECTED event. It >>>> probably should carry peer information too. >>>> >>>> >>> What kind of peer information? >>> >>> We have tcp/fd/exec/unix migrations. calling it CONNECTED vs STARTED, I >>> don't care. But adding information? Notice that the management >>> application knows what it did, I can put the: >>> >>> "exec: gzip -d< /tmp/foo" >>> >>> string, but not much more that I can put here. >>> >>> >> Basically, do we have any useful information in info migrate that we >> can include? >> > (qemu) info migrate > Migration status: active > transferred ram: 874808 kbytes > remaining ram: 227912 kbytes > total ram: 1065344 kbytes > (qemu) > > I can't see anything interesting to put here :( > Ugh. > About the CONNECTED/STARTED distintion, I fully agree with danp. We > just want STARTED event for migration, CONNECTION should be generated > (or not) for all sockets/char devices. it don't make sense for fd/exec > for instance. > That makes sense to me. Regards, Anthony Liguori > Later, Juan. >