From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:59697)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lcapitulino@redhat.com>) id 1RaXzA-0006F4-HO
	for qemu-devel@nongnu.org; Tue, 13 Dec 2011 14:25:57 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <lcapitulino@redhat.com>) id 1RaXz9-00080u-1o
	for qemu-devel@nongnu.org; Tue, 13 Dec 2011 14:25:56 -0500
Received: from mx1.redhat.com ([209.132.183.28]:10919)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lcapitulino@redhat.com>) id 1RaXz8-00080k-NS
	for qemu-devel@nongnu.org; Tue, 13 Dec 2011 14:25:55 -0500
Date: Tue, 13 Dec 2011 17:25:47 -0200
From: Luiz Capitulino <lcapitulino@redhat.com>
Message-ID: <20111213172547.49a06204@doriath>
In-Reply-To: <4EE79722.7070807@codemonkey.ws>
References: <4EE79722.7070807@codemonkey.ws>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] client_migrate_info - do we need a new command?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: spice-devel <spice-devel@lists.freedesktop.org>, Gerd Hoffmann <kraxel@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>

On Tue, 13 Dec 2011 12:19:14 -0600
Anthony Liguori <anthony@codemonkey.ws> wrote:

> In our call today, Avi asked that we evaluate whether the interface for 
> client_migrate_info is the Right Interface before we introduce a new command to 
> work around the fact that async commands are broken.
> 
> I looked into this today and here's what I came to.
> 
> 1) What are the failure scenarios?
> 
> The issue is qerror_report().  Roughly speaking, qerror_report either prints to 
> stderr or it associates an error with the current monitor command.
> 
> The problem with this is that qerror_report() is used all over the code base 
> today and if an error occurs in a device that has nothing to do with the 
> command, instead of printing to stderr, the command will fail with a bizarre 
> error reason (even though it really succeeded).
> 
> 2) Does the command have the right semantics?
> 
> The command has the following doc:
> 
> client_migrate_info
> ------------------
> 
> Set the spice/vnc connection info for the migration target.  The spice/vnc
> server will ask the spice/vnc client to automatically reconnect using the
> new parameters (if specified) once the vm migration finished successfully.
> 
> Arguments:
> 
> - "protocol":     protocol: "spice" or "vnc" (json-string)
> - "hostname":     migration target hostname (json-string)
> - "port":         spice/vnc tcp port for plaintext channels (json-int, optional)
> - "tls-port":     spice tcp port for tls-secured channels (json-int, optional)
> - "cert-subject": server certificate subject (json-string, optional)
> 
> Example:
> 
> -> { "execute": "client_migrate_info",
>       "arguments": { "protocol": "spice",
>                      "hostname": "virt42.lab.kraxel.org",
>                      "port": 1234 } }
> <- { "return": {} }
> 
> Originally, the command was a normal sync command and my understanding is that 
> it simply posted notification to the clients.  Apparently, users of the 
> interface need to actually know when the client has Ack'd this operation because 
> otherwise it's racy since a disconnect may occur before the client processes the 
> redirection.
> 
> OTOH, that means that what we really need is 1) tell connected clients that they 
> need to redirect 2) notification when/if connected clients are prepared to redirect.
> 
> The trouble with using a async command for this is that the time between (1) & 
> (2) may be arbitrarily long.  Since most QMP clients today always use a NULL 
> tag, that effectively means the monitor is blocked for an arbitrarily long time 
> while this operation is in flight.
> 
> I don't know if libspice uses a timeout for this operation, but if it doesn't, 
> this could block arbitrarily long.  Even with tagging, we don't have a way to 
> cancel in flight commands so blocking for arbitrary time periods is problematic.
> 
> I think splitting this into two commands, one that requests the clients to 
> redirect and then an event that lets a tool know that the clients are ready to 
> migrate ends up being nicer.  It means that we never end up with a blocked QMP 
> session and clients are more likely to properly deal with the fact that an event 
> may take arbitrarily long to happen.

I think someone has suggested this down the other thread, and I agree with it.

Wrt the current command, I think that converting it to the QAPI will solve the
issue with the global error.

> 
> Clients can also implement their own cancel logic by choosing to stop waiting 
> for an event to happen and then ignoring spurious events.
> 
> So regardless of the async issue, I think splitting this command is the right 
> thing to do long term.
> 
> Regards,
> 
> Anthony Liguori
>