[Qemu-devel] client_migrate_info - do we need a new command?

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] client_migrate_info - do we need a new command?
@ 2011-12-13 18:19 Anthony Liguori
  2011-12-13 19:25 ` Luiz Capitulino
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Anthony Liguori @ 2011-12-13 18:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: spice-devel, Luiz Capitulino, Avi Kivity, Gerd Hoffmann

In our call today, Avi asked that we evaluate whether the interface for 
client_migrate_info is the Right Interface before we introduce a new command to 
work around the fact that async commands are broken.

I looked into this today and here's what I came to.

1) What are the failure scenarios?

The issue is qerror_report().  Roughly speaking, qerror_report either prints to 
stderr or it associates an error with the current monitor command.

The problem with this is that qerror_report() is used all over the code base 
today and if an error occurs in a device that has nothing to do with the 
command, instead of printing to stderr, the command will fail with a bizarre 
error reason (even though it really succeeded).

2) Does the command have the right semantics?

The command has the following doc:

client_migrate_info
------------------

Set the spice/vnc connection info for the migration target.  The spice/vnc
server will ask the spice/vnc client to automatically reconnect using the
new parameters (if specified) once the vm migration finished successfully.

Arguments:

- "protocol":     protocol: "spice" or "vnc" (json-string)
- "hostname":     migration target hostname (json-string)
- "port":         spice/vnc tcp port for plaintext channels (json-int, optional)
- "tls-port":     spice tcp port for tls-secured channels (json-int, optional)
- "cert-subject": server certificate subject (json-string, optional)

Example:

-> { "execute": "client_migrate_info",
      "arguments": { "protocol": "spice",
                     "hostname": "virt42.lab.kraxel.org",
                     "port": 1234 } }
<- { "return": {} }

Originally, the command was a normal sync command and my understanding is that 
it simply posted notification to the clients.  Apparently, users of the 
interface need to actually know when the client has Ack'd this operation because 
otherwise it's racy since a disconnect may occur before the client processes the 
redirection.

OTOH, that means that what we really need is 1) tell connected clients that they 
need to redirect 2) notification when/if connected clients are prepared to redirect.

The trouble with using a async command for this is that the time between (1) & 
(2) may be arbitrarily long.  Since most QMP clients today always use a NULL 
tag, that effectively means the monitor is blocked for an arbitrarily long time 
while this operation is in flight.

I don't know if libspice uses a timeout for this operation, but if it doesn't, 
this could block arbitrarily long.  Even with tagging, we don't have a way to 
cancel in flight commands so blocking for arbitrary time periods is problematic.

I think splitting this into two commands, one that requests the clients to 
redirect and then an event that lets a tool know that the clients are ready to 
migrate ends up being nicer.  It means that we never end up with a blocked QMP 
session and clients are more likely to properly deal with the fact that an event 
may take arbitrarily long to happen.

Clients can also implement their own cancel logic by choosing to stop waiting 
for an event to happen and then ignoring spurious events.

So regardless of the async issue, I think splitting this command is the right 
thing to do long term.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] client_migrate_info - do we need a new command?
  2011-12-13 18:19 [Qemu-devel] client_migrate_info - do we need a new command? Anthony Liguori
@ 2011-12-13 19:25 ` Luiz Capitulino
  2011-12-14  9:26 ` [Qemu-devel] [Spice-devel] " Yonit Halperin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Luiz Capitulino @ 2011-12-13 19:25 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: spice-devel, Gerd Hoffmann, qemu-devel, Avi Kivity

On Tue, 13 Dec 2011 12:19:14 -0600
Anthony Liguori <anthony@codemonkey.ws> wrote:

> In our call today, Avi asked that we evaluate whether the interface for 
> client_migrate_info is the Right Interface before we introduce a new command to 
> work around the fact that async commands are broken.
> 
> I looked into this today and here's what I came to.
> 
> 1) What are the failure scenarios?
> 
> The issue is qerror_report().  Roughly speaking, qerror_report either prints to 
> stderr or it associates an error with the current monitor command.
> 
> The problem with this is that qerror_report() is used all over the code base 
> today and if an error occurs in a device that has nothing to do with the 
> command, instead of printing to stderr, the command will fail with a bizarre 
> error reason (even though it really succeeded).
> 
> 2) Does the command have the right semantics?
> 
> The command has the following doc:
> 
> client_migrate_info
> ------------------
> 
> Set the spice/vnc connection info for the migration target.  The spice/vnc
> server will ask the spice/vnc client to automatically reconnect using the
> new parameters (if specified) once the vm migration finished successfully.
> 
> Arguments:
> 
> - "protocol":     protocol: "spice" or "vnc" (json-string)
> - "hostname":     migration target hostname (json-string)
> - "port":         spice/vnc tcp port for plaintext channels (json-int, optional)
> - "tls-port":     spice tcp port for tls-secured channels (json-int, optional)
> - "cert-subject": server certificate subject (json-string, optional)
> 
> Example:
> 
> -> { "execute": "client_migrate_info",
>       "arguments": { "protocol": "spice",
>                      "hostname": "virt42.lab.kraxel.org",
>                      "port": 1234 } }
> <- { "return": {} }
> 
> Originally, the command was a normal sync command and my understanding is that 
> it simply posted notification to the clients.  Apparently, users of the 
> interface need to actually know when the client has Ack'd this operation because 
> otherwise it's racy since a disconnect may occur before the client processes the 
> redirection.
> 
> OTOH, that means that what we really need is 1) tell connected clients that they 
> need to redirect 2) notification when/if connected clients are prepared to redirect.
> 
> The trouble with using a async command for this is that the time between (1) & 
> (2) may be arbitrarily long.  Since most QMP clients today always use a NULL 
> tag, that effectively means the monitor is blocked for an arbitrarily long time 
> while this operation is in flight.
> 
> I don't know if libspice uses a timeout for this operation, but if it doesn't, 
> this could block arbitrarily long.  Even with tagging, we don't have a way to 
> cancel in flight commands so blocking for arbitrary time periods is problematic.
> 
> I think splitting this into two commands, one that requests the clients to 
> redirect and then an event that lets a tool know that the clients are ready to 
> migrate ends up being nicer.  It means that we never end up with a blocked QMP 
> session and clients are more likely to properly deal with the fact that an event 
> may take arbitrarily long to happen.

I think someone has suggested this down the other thread, and I agree with it.

Wrt the current command, I think that converting it to the QAPI will solve the
issue with the global error.

> 
> Clients can also implement their own cancel logic by choosing to stop waiting 
> for an event to happen and then ignoring spurious events.
> 
> So regardless of the async issue, I think splitting this command is the right 
> thing to do long term.
> 
> Regards,
> 
> Anthony Liguori
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [Spice-devel] client_migrate_info - do we need a new command?
  2011-12-13 18:19 [Qemu-devel] client_migrate_info - do we need a new command? Anthony Liguori
  2011-12-13 19:25 ` Luiz Capitulino
@ 2011-12-14  9:26 ` Yonit Halperin
  2011-12-14 10:16 ` [Qemu-devel] " Avi Kivity
  2011-12-15  9:03 ` Gerd Hoffmann
  3 siblings, 0 replies; 5+ messages in thread
From: Yonit Halperin @ 2011-12-14  9:26 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: spice-devel, Avi Kivity, qemu-devel, Luiz Capitulino

Hi,
On 12/13/2011 08:19 PM, Anthony Liguori wrote:
> In our call today, Avi asked that we evaluate whether the interface for
> client_migrate_info is the Right Interface before we introduce a new
> command to work around the fact that async commands are broken.
>
> I looked into this today and here's what I came to.
>
> 1) What are the failure scenarios?
>
> The issue is qerror_report(). Roughly speaking, qerror_report either
> prints to stderr or it associates an error with the current monitor
> command.
>
> The problem with this is that qerror_report() is used all over the code
> base today and if an error occurs in a device that has nothing to do
> with the command, instead of printing to stderr, the command will fail
> with a bizarre error reason (even though it really succeeded).
>
> 2) Does the command have the right semantics?
>
> The command has the following doc:
>
> client_migrate_info
> ------------------
>
> Set the spice/vnc connection info for the migration target. The spice/vnc
> server will ask the spice/vnc client to automatically reconnect using the
> new parameters (if specified) once the vm migration finished successfully.
>
> Arguments:
>
> - "protocol": protocol: "spice" or "vnc" (json-string)
> - "hostname": migration target hostname (json-string)
> - "port": spice/vnc tcp port for plaintext channels (json-int, optional)
> - "tls-port": spice tcp port for tls-secured channels (json-int, optional)
> - "cert-subject": server certificate subject (json-string, optional)
>
> Example:
>
> -> { "execute": "client_migrate_info",
> "arguments": { "protocol": "spice",
> "hostname": "virt42.lab.kraxel.org",
> "port": 1234 } }
> <- { "return": {} }
>
> Originally, the command was a normal sync command and my understanding
> is that it simply posted notification to the clients. Apparently, users
> of the interface need to actually know when the client has Ack'd this
> operation because otherwise it's racy since a disconnect may occur
> before the client processes the redirection.
>
It's racy because the migration can start before the client manages to 
connect to the migration target. And since the target is unresponsive 
during migration, the client will manage to connect to it only after 
migration completes; but that can take a while, and the client's ticket 
might expire till then.
> OTOH, that means that what we really need is 1) tell connected clients
> that they need to redirect 2) notification when/if connected clients are
> prepared to redirect.
>
> The trouble with using a async command for this is that the time between
> (1) & (2) may be arbitrarily long. Since most QMP clients today always
> use a NULL tag, that effectively means the monitor is blocked for an
> arbitrarily long time while this operation is in flight.
>
> I don't know if libspice uses a timeout for this operation,
We use a timeout of 10 Sec
but if it
> doesn't, this could block arbitrarily long. Even with tagging, we don't
> have a way to cancel in flight commands so blocking for arbitrary time
> periods is problematic.
>
> I think splitting this into two commands, one that requests the clients
> to redirect and then an event that lets a tool know that the clients are
> ready to migrate ends up being nicer. It means that we never end up with
> a blocked QMP session and clients are more likely to properly deal with
> the fact that an event may take arbitrarily long to happen.
>
> Clients can also implement their own cancel logic by choosing to stop
> waiting for an event to happen and then ignoring spurious events.
>
> So regardless of the async issue, I think splitting this command is the
> right thing to do long term.
I just want to emphasize that using client_migrate_info for connecting 
to to target is more of a workaround. IMHO, the more complete solution 
would have been (similar to the one we have in Rhel5):
1) Add a migration notifier for pre-starting migration. Introduce 
completion cb to these notifiers.
2) actually start migration only after the completion cb is called.

Then, the client_migrate_info can go back to be sync.
If we already plan to make changes, maybe they should be aimed to such a 
solution.
Btw, if we also had such notifier for pre-finish migration (before 
starting the target vm), we could even turn the client migration to be 
really seamless again.

Regards,
Yonit.

>
> Regards,
>
> Anthony Liguori
> _______________________________________________
> Spice-devel mailing list
> Spice-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/spice-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] client_migrate_info - do we need a new command?
  2011-12-13 18:19 [Qemu-devel] client_migrate_info - do we need a new command? Anthony Liguori
  2011-12-13 19:25 ` Luiz Capitulino
  2011-12-14  9:26 ` [Qemu-devel] [Spice-devel] " Yonit Halperin
@ 2011-12-14 10:16 ` Avi Kivity
  2011-12-15  9:03 ` Gerd Hoffmann
  3 siblings, 0 replies; 5+ messages in thread
From: Avi Kivity @ 2011-12-14 10:16 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: spice-devel, Luiz Capitulino, qemu-devel, Gerd Hoffmann

On 12/13/2011 08:19 PM, Anthony Liguori wrote:
> In our call today, Avi asked that we evaluate whether the interface
> for client_migrate_info is the Right Interface before we introduce a
> new command to work around the fact that async commands are broken.
>
> I looked into this today and here's what I came to.
>

Thanks.

> 1) What are the failure scenarios?
>
> The issue is qerror_report().  Roughly speaking, qerror_report either
> prints to stderr or it associates an error with the current monitor
> command.
>
> The problem with this is that qerror_report() is used all over the
> code base today and if an error occurs in a device that has nothing to
> do with the command, instead of printing to stderr, the command will
> fail with a bizarre error reason (even though it really succeeded).
>
> 2) Does the command have the right semantics?
>
> The command has the following doc:
>
> client_migrate_info
> ------------------

Somewhat poorly named - commands should be verbs.

>
> Set the spice/vnc connection info for the migration target.  The
> spice/vnc
> server will ask the spice/vnc client to automatically reconnect using the
> new parameters (if specified) once the vm migration finished
> successfully.
>
> Arguments:
>
> - "protocol":     protocol: "spice" or "vnc" (json-string)
> - "hostname":     migration target hostname (json-string)
> - "port":         spice/vnc tcp port for plaintext channels (json-int,
> optional)
> - "tls-port":     spice tcp port for tls-secured channels (json-int,
> optional)
> - "cert-subject": server certificate subject (json-string, optional)
>
> Example:
>
> -> { "execute": "client_migrate_info",
>      "arguments": { "protocol": "spice",
>                     "hostname": "virt42.lab.kraxel.org",
>                     "port": 1234 } }
> <- { "return": {} }
>
> Originally, the command was a normal sync command and my understanding
> is that it simply posted notification to the clients.  Apparently,
> users of the interface need to actually know when the client has Ack'd
> this operation because otherwise it's racy since a disconnect may
> occur before the client processes the redirection.
>
> OTOH, that means that what we really need is 1) tell connected clients
> that they need to redirect 2) notification when/if connected clients
> are prepared to redirect.
>
> The trouble with using a async command for this is that the time
> between (1) & (2) may be arbitrarily long.  Since most QMP clients
> today always use a NULL tag, that effectively means the monitor is
> blocked for an arbitrarily long time while this operation is in flight.
>
> I don't know if libspice uses a timeout for this operation, but if it
> doesn't, this could block arbitrarily long.  Even with tagging, we
> don't have a way to cancel in flight commands so blocking for
> arbitrary time periods is problematic.
>
> I think splitting this into two commands, one that requests the
> clients to redirect and then an event that lets a tool know that the
> clients are ready to migrate ends up being nicer.  It means that we
> never end up with a blocked QMP session and clients are more likely to
> properly deal with the fact that an event may take arbitrarily long to
> happen.
>
> Clients can also implement their own cancel logic by choosing to stop
> waiting for an event to happen and then ignoring spurious events.
>
> So regardless of the async issue, I think splitting this command is
> the right thing to do long term.
>

Nothing is solved by the split; it has exactly the same issues.  If an
error occurs during execution of the command (say, a timeout), you need
to capture the error and return it during the event.  If the command
consumes resources or takes a lock, you need to send a cancellation
request or it will continue executing.  You've simply renamed the return
part of the RPC to an event.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] client_migrate_info - do we need a new command?
  2011-12-13 18:19 [Qemu-devel] client_migrate_info - do we need a new command? Anthony Liguori
                   ` (2 preceding siblings ...)
  2011-12-14 10:16 ` [Qemu-devel] " Avi Kivity
@ 2011-12-15  9:03 ` Gerd Hoffmann
  3 siblings, 0 replies; 5+ messages in thread
From: Gerd Hoffmann @ 2011-12-15  9:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: spice-devel, Luiz Capitulino, qemu-devel, Avi Kivity

  Hi,

> Originally, the command was a normal sync command and my understanding
> is that it simply posted notification to the clients.  Apparently, users
> of the interface need to actually know when the client has Ack'd this
> operation because otherwise it's racy since a disconnect may occur
> before the client processes the redirection.

No.  The problem is that qemu doesn't process any other I/O while a
incoming migration is running, thats why we have to serialize things:
First have spice client connect to the target (and wait for that op to
finish).  Then kick off live migration.

That reminds me that there is another way to fix it:  Simply lift the
restriction to not process I/O while the incoming live migration runs,
then the need to serialize goes away.

What is the status here?  I remember this being discussed in the past
for other reasons.  Also moving migration to a thread would probably
easily allow the I/O thread run in parallel ...

> The trouble with using a async command for this is that the time between
> (1) & (2) may be arbitrarily long.  Since most QMP clients today always
> use a NULL tag, that effectively means the monitor is blocked for an
> arbitrarily long time while this operation is in flight.

There is a pretty short timeout (five seconds or so).

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-12-15  9:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-13 18:19 [Qemu-devel] client_migrate_info - do we need a new command? Anthony Liguori
2011-12-13 19:25 ` Luiz Capitulino
2011-12-14  9:26 ` [Qemu-devel] [Spice-devel] " Yonit Halperin
2011-12-14 10:16 ` [Qemu-devel] " Avi Kivity
2011-12-15  9:03 ` Gerd Hoffmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).