From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:59697) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaXzA-0006F4-HO for qemu-devel@nongnu.org; Tue, 13 Dec 2011 14:25:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RaXz9-00080u-1o for qemu-devel@nongnu.org; Tue, 13 Dec 2011 14:25:56 -0500 Received: from mx1.redhat.com ([209.132.183.28]:10919) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaXz8-00080k-NS for qemu-devel@nongnu.org; Tue, 13 Dec 2011 14:25:55 -0500 Date: Tue, 13 Dec 2011 17:25:47 -0200 From: Luiz Capitulino Message-ID: <20111213172547.49a06204@doriath> In-Reply-To: <4EE79722.7070807@codemonkey.ws> References: <4EE79722.7070807@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] client_migrate_info - do we need a new command? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: spice-devel , Gerd Hoffmann , qemu-devel , Avi Kivity On Tue, 13 Dec 2011 12:19:14 -0600 Anthony Liguori wrote: > In our call today, Avi asked that we evaluate whether the interface for > client_migrate_info is the Right Interface before we introduce a new command to > work around the fact that async commands are broken. > > I looked into this today and here's what I came to. > > 1) What are the failure scenarios? > > The issue is qerror_report(). Roughly speaking, qerror_report either prints to > stderr or it associates an error with the current monitor command. > > The problem with this is that qerror_report() is used all over the code base > today and if an error occurs in a device that has nothing to do with the > command, instead of printing to stderr, the command will fail with a bizarre > error reason (even though it really succeeded). > > 2) Does the command have the right semantics? > > The command has the following doc: > > client_migrate_info > ------------------ > > Set the spice/vnc connection info for the migration target. The spice/vnc > server will ask the spice/vnc client to automatically reconnect using the > new parameters (if specified) once the vm migration finished successfully. > > Arguments: > > - "protocol": protocol: "spice" or "vnc" (json-string) > - "hostname": migration target hostname (json-string) > - "port": spice/vnc tcp port for plaintext channels (json-int, optional) > - "tls-port": spice tcp port for tls-secured channels (json-int, optional) > - "cert-subject": server certificate subject (json-string, optional) > > Example: > > -> { "execute": "client_migrate_info", > "arguments": { "protocol": "spice", > "hostname": "virt42.lab.kraxel.org", > "port": 1234 } } > <- { "return": {} } > > Originally, the command was a normal sync command and my understanding is that > it simply posted notification to the clients. Apparently, users of the > interface need to actually know when the client has Ack'd this operation because > otherwise it's racy since a disconnect may occur before the client processes the > redirection. > > OTOH, that means that what we really need is 1) tell connected clients that they > need to redirect 2) notification when/if connected clients are prepared to redirect. > > The trouble with using a async command for this is that the time between (1) & > (2) may be arbitrarily long. Since most QMP clients today always use a NULL > tag, that effectively means the monitor is blocked for an arbitrarily long time > while this operation is in flight. > > I don't know if libspice uses a timeout for this operation, but if it doesn't, > this could block arbitrarily long. Even with tagging, we don't have a way to > cancel in flight commands so blocking for arbitrary time periods is problematic. > > I think splitting this into two commands, one that requests the clients to > redirect and then an event that lets a tool know that the clients are ready to > migrate ends up being nicer. It means that we never end up with a blocked QMP > session and clients are more likely to properly deal with the fact that an event > may take arbitrarily long to happen. I think someone has suggested this down the other thread, and I agree with it. Wrt the current command, I think that converting it to the QAPI will solve the issue with the global error. > > Clients can also implement their own cancel logic by choosing to stop waiting > for an event to happen and then ignoring spurious events. > > So regardless of the async issue, I think splitting this command is the right > thing to do long term. > > Regards, > > Anthony Liguori >