[Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
@ 2012-01-04 19:45 Luiz Capitulino
  2012-01-04 19:45 ` [Qemu-devel] [PATCH 1/2] qemu-ga: set O_NONBLOCK for serial channels Luiz Capitulino
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Luiz Capitulino @ 2012-01-04 19:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, jcody, mdroth

This version drops modes 'sleep' and 'hybrid' because they don't work
properly due to issues in qemu. Only the 'hibernate' mode is supported
for now.

Also note that virtio doesn't currently support ACPI S4. There are
patches flying on lkml to fix that though.

Please refer to patch 2/2 for more details on the implementation.

v4

o Drop 'sleep' and 'hybrid' modes
o pull in a fix from Michael Roth (patch 1/2)

 qapi-schema-guest.json     |   23 ++++++++++++++++++
 qemu-ga.c                  |   19 +++++++++++++-
 qga/guest-agent-commands.c |   55 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 1/2] qemu-ga: set O_NONBLOCK for serial channels
  2012-01-04 19:45 [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command Luiz Capitulino
@ 2012-01-04 19:45 ` Luiz Capitulino
  2012-01-04 19:55   ` Michael Roth
  2012-01-04 19:45 ` [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command Luiz Capitulino
  2012-01-05 10:16 ` [Qemu-devel] [PATCH v4 0/2]: " Daniel P. Berrange
  2 siblings, 1 reply; 21+ messages in thread
From: Luiz Capitulino @ 2012-01-04 19:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, jcody, mdroth

This fixes a bug when using -m isa-serial where qemu-ga will
hang on a read()'s when communicating to the host via isa-serial.

Original fix by Michael Roth.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 qemu-ga.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/qemu-ga.c b/qemu-ga.c
index 200bb15..98e4dfe 100644
--- a/qemu-ga.c
+++ b/qemu-ga.c
@@ -504,7 +504,7 @@ static void init_guest_agent(GAState *s)
             exit(EXIT_FAILURE);
         }
     } else if (strcmp(s->method, "isa-serial") == 0) {
-        fd = qemu_open(s->path, O_RDWR | O_NOCTTY);
+        fd = qemu_open(s->path, O_RDWR | O_NOCTTY | O_NONBLOCK);
         if (fd == -1) {
             g_critical("error opening channel: %s", strerror(errno));
             exit(EXIT_FAILURE);
-- 
1.7.8.2.321.g4570a.dirty

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command
  2012-01-04 19:45 [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command Luiz Capitulino
  2012-01-04 19:45 ` [Qemu-devel] [PATCH 1/2] qemu-ga: set O_NONBLOCK for serial channels Luiz Capitulino
@ 2012-01-04 19:45 ` Luiz Capitulino
  2012-01-04 20:00   ` Michael Roth
                     ` (2 more replies)
  2012-01-05 10:16 ` [Qemu-devel] [PATCH v4 0/2]: " Daniel P. Berrange
  2 siblings, 3 replies; 21+ messages in thread
From: Luiz Capitulino @ 2012-01-04 19:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, jcody, mdroth

For now it only supports the "hibernate" mode, which suspends the
guest to disk.

This command will try to execute the scripts provided by the pm-utils
package. If that fails, it will try to suspend manually by writing
to the "/sys/power/state" file.

To reap terminated children, a new signal handler is installed to
catch SIGCHLD signals and a non-blocking call to waitpid() is done to
collect their exit statuses.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 qapi-schema-guest.json     |   23 ++++++++++++++++++
 qemu-ga.c                  |   17 ++++++++++++-
 qga/guest-agent-commands.c |   55 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+), 1 deletions(-)

diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json
index 5f8a18d..b151670 100644
--- a/qapi-schema-guest.json
+++ b/qapi-schema-guest.json
@@ -219,3 +219,26 @@
 ##
 { 'command': 'guest-fsfreeze-thaw',
   'returns': 'int' }
+
+##
+# @guest-suspend
+#
+# Suspend guest execution by changing the guest's ACPI power state.
+#
+# This command tries to execute the scripts provided by the pm-utils
+# package. If they are not available, it will perform the suspend
+# operation by manually writing to a sysfs file.
+#
+# For the best results it's strongly recommended to have the pm-utils
+# package installed in the guest.
+#
+# @mode: 'hibernate' RAM content is saved to the disk and the guest is
+#                    powered off (this corresponds to ACPI S4)
+#
+# Notes: This is an asynchronous request. There's no guarantee a response
+# will be sent. Errors will be logged to guest's syslog. More modes are
+# expected in the future.
+#
+# Since: 1.1
+##
+{ 'command': 'guest-suspend', 'data': { 'mode': 'str' } }
diff --git a/qemu-ga.c b/qemu-ga.c
index 98e4dfe..5b7a7a5 100644
--- a/qemu-ga.c
+++ b/qemu-ga.c
@@ -17,6 +17,7 @@
 #include <getopt.h>
 #include <termios.h>
 #include <syslog.h>
+#include <sys/wait.h>
 #include "qemu_socket.h"
 #include "json-streamer.h"
 #include "json-parser.h"
@@ -59,9 +60,15 @@ static void quit_handler(int sig)
     }
 }
 
+static void child_handler(int sig)
+{
+    int status;
+    waitpid(-1, &status, WNOHANG);
+}
+
 static void register_signal_handlers(void)
 {
-    struct sigaction sigact;
+    struct sigaction sigact, sigact_chld;
     int ret;
 
     memset(&sigact, 0, sizeof(struct sigaction));
@@ -76,6 +83,14 @@ static void register_signal_handlers(void)
     if (ret == -1) {
         g_error("error configuring signal handler: %s", strerror(errno));
     }
+
+    memset(&sigact_chld, 0, sizeof(struct sigaction));
+    sigact_chld.sa_handler = child_handler;
+    sigact_chld.sa_flags = SA_NOCLDSTOP;
+    ret = sigaction(SIGCHLD, &sigact_chld, NULL);
+    if (ret == -1) {
+        g_error("error configuring signal handler: %s", strerror(errno));
+    }
 }
 
 static void usage(const char *cmd)
diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c
index a09c8ca..19f29c6 100644
--- a/qga/guest-agent-commands.c
+++ b/qga/guest-agent-commands.c
@@ -574,6 +574,61 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err)
 }
 #endif
 
+#define LINUX_SYS_STATE_FILE "/sys/power/state"
+
+void qmp_guest_suspend(const char *mode, Error **err)
+{
+    pid_t pid;
+    const char *pmutils_bin;
+
+    /* TODO implement 'sleep' and 'hybrid' modes once qemu is fixed to
+       support them */
+    if (strcmp(mode, "hibernate") == 0) {
+        pmutils_bin = "pm-hibernate";
+    } else {
+        error_set(err, QERR_INVALID_PARAMETER, "mode");
+        return;
+    }
+
+    pid = fork();
+    if (pid == 0) {
+        /* child */
+        int fd;
+
+        setsid();
+        fclose(stdin);
+        fclose(stdout);
+        fclose(stderr);
+
+        execlp(pmutils_bin, pmutils_bin, NULL);
+
+        /* 
+         * The exec call should not return, if it does something went wrong.
+         * In this case we try to suspend manually if 'mode' is 'hibernate'
+         */
+        slog("could not execute %s: %s\n", pmutils_bin, strerror(errno));
+        slog("trying to suspend using the manual method...\n");
+
+        fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
+        if (fd < 0) {
+            slog("can't open file %s: %s\n", LINUX_SYS_STATE_FILE,
+                    strerror(errno));
+            exit(1);
+        }
+
+        if (write(fd, "disk", 4) < 0) {
+            slog("can't write to %s: %s\n", LINUX_SYS_STATE_FILE,
+                    strerror(errno));
+            exit(1);
+        }
+
+        exit(0);
+    } else if (pid < 0) {
+        error_set(err, QERR_UNDEFINED_ERROR);
+        return;
+    }
+}
+
 /* register init/cleanup routines for stateful command groups */
 void ga_command_state_init(GAState *s, GACommandState *cs)
 {
-- 
1.7.8.2.321.g4570a.dirty

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qemu-ga: set O_NONBLOCK for serial channels
  2012-01-04 19:45 ` [Qemu-devel] [PATCH 1/2] qemu-ga: set O_NONBLOCK for serial channels Luiz Capitulino
@ 2012-01-04 19:55   ` Michael Roth
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Roth @ 2012-01-04 19:55 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: amit.shah, jcody, qemu-devel

On 01/04/2012 01:45 PM, Luiz Capitulino wrote:
> This fixes a bug when using -m isa-serial where qemu-ga will
> hang on a read()'s when communicating to the host via isa-serial.
>
> Original fix by Michael Roth.
>
> Signed-off-by: Luiz Capitulino<lcapitulino@redhat.com>
> ---
>   qemu-ga.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/qemu-ga.c b/qemu-ga.c
> index 200bb15..98e4dfe 100644
> --- a/qemu-ga.c
> +++ b/qemu-ga.c
> @@ -504,7 +504,7 @@ static void init_guest_agent(GAState *s)
>               exit(EXIT_FAILURE);
>           }
>       } else if (strcmp(s->method, "isa-serial") == 0) {
> -        fd = qemu_open(s->path, O_RDWR | O_NOCTTY);
> +        fd = qemu_open(s->path, O_RDWR | O_NOCTTY | O_NONBLOCK);
>           if (fd == -1) {
>               g_critical("error opening channel: %s", strerror(errno));
>               exit(EXIT_FAILURE);

Thanks for sending this.

Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command
  2012-01-04 19:45 ` [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command Luiz Capitulino
@ 2012-01-04 20:00   ` Michael Roth
  2012-01-04 20:03   ` Eric Blake
  2012-01-05 12:46   ` Daniel P. Berrange
  2 siblings, 0 replies; 21+ messages in thread
From: Michael Roth @ 2012-01-04 20:00 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: amit.shah, jcody, qemu-devel

On 01/04/2012 01:45 PM, Luiz Capitulino wrote:
> For now it only supports the "hibernate" mode, which suspends the
> guest to disk.
>
> This command will try to execute the scripts provided by the pm-utils
> package. If that fails, it will try to suspend manually by writing
> to the "/sys/power/state" file.
>
> To reap terminated children, a new signal handler is installed to
> catch SIGCHLD signals and a non-blocking call to waitpid() is done to
> collect their exit statuses.
>
> Signed-off-by: Luiz Capitulino<lcapitulino@redhat.com>

Looks good.

Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

> ---
>   qapi-schema-guest.json     |   23 ++++++++++++++++++
>   qemu-ga.c                  |   17 ++++++++++++-
>   qga/guest-agent-commands.c |   55 ++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 94 insertions(+), 1 deletions(-)
>
> diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json
> index 5f8a18d..b151670 100644
> --- a/qapi-schema-guest.json
> +++ b/qapi-schema-guest.json
> @@ -219,3 +219,26 @@
>   ##
>   { 'command': 'guest-fsfreeze-thaw',
>     'returns': 'int' }
> +
> +##
> +# @guest-suspend
> +#
> +# Suspend guest execution by changing the guest's ACPI power state.
> +#
> +# This command tries to execute the scripts provided by the pm-utils
> +# package. If they are not available, it will perform the suspend
> +# operation by manually writing to a sysfs file.
> +#
> +# For the best results it's strongly recommended to have the pm-utils
> +# package installed in the guest.
> +#
> +# @mode: 'hibernate' RAM content is saved to the disk and the guest is
> +#                    powered off (this corresponds to ACPI S4)
> +#
> +# Notes: This is an asynchronous request. There's no guarantee a response
> +# will be sent. Errors will be logged to guest's syslog. More modes are
> +# expected in the future.
> +#
> +# Since: 1.1
> +##
> +{ 'command': 'guest-suspend', 'data': { 'mode': 'str' } }
> diff --git a/qemu-ga.c b/qemu-ga.c
> index 98e4dfe..5b7a7a5 100644
> --- a/qemu-ga.c
> +++ b/qemu-ga.c
> @@ -17,6 +17,7 @@
>   #include<getopt.h>
>   #include<termios.h>
>   #include<syslog.h>
> +#include<sys/wait.h>
>   #include "qemu_socket.h"
>   #include "json-streamer.h"
>   #include "json-parser.h"
> @@ -59,9 +60,15 @@ static void quit_handler(int sig)
>       }
>   }
>
> +static void child_handler(int sig)
> +{
> +    int status;
> +    waitpid(-1,&status, WNOHANG);
> +}
> +
>   static void register_signal_handlers(void)
>   {
> -    struct sigaction sigact;
> +    struct sigaction sigact, sigact_chld;
>       int ret;
>
>       memset(&sigact, 0, sizeof(struct sigaction));
> @@ -76,6 +83,14 @@ static void register_signal_handlers(void)
>       if (ret == -1) {
>           g_error("error configuring signal handler: %s", strerror(errno));
>       }
> +
> +    memset(&sigact_chld, 0, sizeof(struct sigaction));
> +    sigact_chld.sa_handler = child_handler;
> +    sigact_chld.sa_flags = SA_NOCLDSTOP;
> +    ret = sigaction(SIGCHLD,&sigact_chld, NULL);
> +    if (ret == -1) {
> +        g_error("error configuring signal handler: %s", strerror(errno));
> +    }
>   }
>
>   static void usage(const char *cmd)
> diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c
> index a09c8ca..19f29c6 100644
> --- a/qga/guest-agent-commands.c
> +++ b/qga/guest-agent-commands.c
> @@ -574,6 +574,61 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err)
>   }
>   #endif
>
> +#define LINUX_SYS_STATE_FILE "/sys/power/state"
> +
> +void qmp_guest_suspend(const char *mode, Error **err)
> +{
> +    pid_t pid;
> +    const char *pmutils_bin;
> +
> +    /* TODO implement 'sleep' and 'hybrid' modes once qemu is fixed to
> +       support them */
> +    if (strcmp(mode, "hibernate") == 0) {
> +        pmutils_bin = "pm-hibernate";
> +    } else {
> +        error_set(err, QERR_INVALID_PARAMETER, "mode");
> +        return;
> +    }
> +
> +    pid = fork();
> +    if (pid == 0) {
> +        /* child */
> +        int fd;
> +
> +        setsid();
> +        fclose(stdin);
> +        fclose(stdout);
> +        fclose(stderr);
> +
> +        execlp(pmutils_bin, pmutils_bin, NULL);
> +
> +        /*
> +         * The exec call should not return, if it does something went wrong.
> +         * In this case we try to suspend manually if 'mode' is 'hibernate'
> +         */
> +        slog("could not execute %s: %s\n", pmutils_bin, strerror(errno));
> +        slog("trying to suspend using the manual method...\n");
> +
> +        fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
> +        if (fd<  0) {
> +            slog("can't open file %s: %s\n", LINUX_SYS_STATE_FILE,
> +                    strerror(errno));
> +            exit(1);
> +        }
> +
> +        if (write(fd, "disk", 4)<  0) {
> +            slog("can't write to %s: %s\n", LINUX_SYS_STATE_FILE,
> +                    strerror(errno));
> +            exit(1);
> +        }
> +
> +        exit(0);
> +    } else if (pid<  0) {
> +        error_set(err, QERR_UNDEFINED_ERROR);
> +        return;
> +    }
> +}
> +
>   /* register init/cleanup routines for stateful command groups */
>   void ga_command_state_init(GAState *s, GACommandState *cs)
>   {

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command
  2012-01-04 19:45 ` [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command Luiz Capitulino
  2012-01-04 20:00   ` Michael Roth
@ 2012-01-04 20:03   ` Eric Blake
  2012-01-05 12:29     ` Luiz Capitulino
  2012-01-05 12:46   ` Daniel P. Berrange
  2 siblings, 1 reply; 21+ messages in thread
From: Eric Blake @ 2012-01-04 20:03 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: amit.shah, jcody, qemu-devel, mdroth

[-- Attachment #1: Type: text/plain, Size: 1568 bytes --]

On 01/04/2012 12:45 PM, Luiz Capitulino wrote:
> +    if (pid == 0) {
> +        /* child */
> +        int fd;
> +
> +        setsid();
> +        fclose(stdin);
> +        fclose(stdout);
> +        fclose(stderr);
> +
> +        execlp(pmutils_bin, pmutils_bin, NULL);

It's generally a bad idea to exec a child process without fd 0, 1, and 2
open on something, even if that something is /dev/null.  POSIX says that
the system may, but not must, reopen fds on your behalf, and that the
child without open std descriptors is then executing in a non-conforming
environment and may misbehave in unexpected manners.

> +
> +        /* 
> +         * The exec call should not return, if it does something went wrong.
> +         * In this case we try to suspend manually if 'mode' is 'hibernate'
> +         */
> +        slog("could not execute %s: %s\n", pmutils_bin, strerror(errno));
> +        slog("trying to suspend using the manual method...\n");
> +
> +        fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);

Worse, since you _just_ closed stdin above, fd here will most likely be
0, but a O_WRONLY stdin is asking for problems.

> +        if (fd < 0) {
> +            slog("can't open file %s: %s\n", LINUX_SYS_STATE_FILE,
> +                    strerror(errno));

Also, I have no idea where slog() writes to, but since you closed
stderr, if slog() is trying to use stderr, your error messages would be
invisible.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-04 19:45 [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command Luiz Capitulino
  2012-01-04 19:45 ` [Qemu-devel] [PATCH 1/2] qemu-ga: set O_NONBLOCK for serial channels Luiz Capitulino
  2012-01-04 19:45 ` [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command Luiz Capitulino
@ 2012-01-05 10:16 ` Daniel P. Berrange
  2012-01-05 12:37   ` Luiz Capitulino
  2 siblings, 1 reply; 21+ messages in thread
From: Daniel P. Berrange @ 2012-01-05 10:16 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: amit.shah, jcody, qemu-devel, mdroth

On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
> This version drops modes 'sleep' and 'hybrid' because they don't work
> properly due to issues in qemu. Only the 'hibernate' mode is supported
> for now.

IMHO this is short-sighted. When the bugs QEMU in are fixed so that
these modes work, you have needlessly put users in the situation where
they have to now upgrade the guest agent everywhere to take advantage
of the bugfix.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command
  2012-01-04 20:03   ` Eric Blake
@ 2012-01-05 12:29     ` Luiz Capitulino
  0 siblings, 0 replies; 21+ messages in thread
From: Luiz Capitulino @ 2012-01-05 12:29 UTC (permalink / raw)
  To: Eric Blake; +Cc: amit.shah, jcody, qemu-devel, mdroth

On Wed, 04 Jan 2012 13:03:26 -0700
Eric Blake <eblake@redhat.com> wrote:

> On 01/04/2012 12:45 PM, Luiz Capitulino wrote:
> > +    if (pid == 0) {
> > +        /* child */
> > +        int fd;
> > +
> > +        setsid();
> > +        fclose(stdin);
> > +        fclose(stdout);
> > +        fclose(stderr);
> > +
> > +        execlp(pmutils_bin, pmutils_bin, NULL);
> 
> It's generally a bad idea to exec a child process without fd 0, 1, and 2
> open on something, even if that something is /dev/null.  POSIX says that
> the system may, but not must, reopen fds on your behalf, and that the
> child without open std descriptors is then executing in a non-conforming
> environment and may misbehave in unexpected manners.

You're right. I just copied what we do in qmp_guest_shutdown()... I think we
have to open /dev/null then, do you agree Michael?

I think I'm doing to use dup2(), like dup2(dev_null_fd, 0). Any better
recommendation?

> 
> > +
> > +        /* 
> > +         * The exec call should not return, if it does something went wrong.
> > +         * In this case we try to suspend manually if 'mode' is 'hibernate'
> > +         */
> > +        slog("could not execute %s: %s\n", pmutils_bin, strerror(errno));
> > +        slog("trying to suspend using the manual method...\n");
> > +
> > +        fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
> 
> Worse, since you _just_ closed stdin above, fd here will most likely be
> 0, but a O_WRONLY stdin is asking for problems.
> 
> > +        if (fd < 0) {
> > +            slog("can't open file %s: %s\n", LINUX_SYS_STATE_FILE,
> > +                    strerror(errno));
> 
> Also, I have no idea where slog() writes to, but since you closed
> stderr, if slog() is trying to use stderr, your error messages would be
> invisible.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 10:16 ` [Qemu-devel] [PATCH v4 0/2]: " Daniel P. Berrange
@ 2012-01-05 12:37   ` Luiz Capitulino
  2012-01-05 12:59     ` Daniel P. Berrange
  0 siblings, 1 reply; 21+ messages in thread
From: Luiz Capitulino @ 2012-01-05 12:37 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: amit.shah, jcody, qemu-devel, mdroth

On Thu, 5 Jan 2012 10:16:30 +0000
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
> > This version drops modes 'sleep' and 'hybrid' because they don't work
> > properly due to issues in qemu. Only the 'hibernate' mode is supported
> > for now.
> 
> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
> these modes work, you have needlessly put users in the situation where
> they have to now upgrade the guest agent everywhere to take advantage
> of the bugfix.

That was my thinking until v4. But after discussing with Michael the issues
we have with S3 I concluded that it doesn't make sense to offer an API to
something that doesn't work, this will just generate bug reports. Also,
updating to get new features is normal and expected.

I'm willing to change my mind if I'm the only one thinking like this though.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command
  2012-01-04 19:45 ` [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command Luiz Capitulino
  2012-01-04 20:00   ` Michael Roth
  2012-01-04 20:03   ` Eric Blake
@ 2012-01-05 12:46   ` Daniel P. Berrange
  2012-01-05 12:58     ` Luiz Capitulino
  2 siblings, 1 reply; 21+ messages in thread
From: Daniel P. Berrange @ 2012-01-05 12:46 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: amit.shah, jcody, qemu-devel, mdroth

On Wed, Jan 04, 2012 at 05:45:13PM -0200, Luiz Capitulino wrote:
> diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c
> index a09c8ca..19f29c6 100644
> --- a/qga/guest-agent-commands.c
> +++ b/qga/guest-agent-commands.c
> @@ -574,6 +574,61 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err)
>  }
>  #endif
>  
> +#define LINUX_SYS_STATE_FILE "/sys/power/state"
> +
> +void qmp_guest_suspend(const char *mode, Error **err)
> +{
> +    pid_t pid;
> +    const char *pmutils_bin;
> +
> +    /* TODO implement 'sleep' and 'hybrid' modes once qemu is fixed to
> +       support them */
> +    if (strcmp(mode, "hibernate") == 0) {
> +        pmutils_bin = "pm-hibernate";
> +    } else {
> +        error_set(err, QERR_INVALID_PARAMETER, "mode");
> +        return;
> +    }
> +
> +    pid = fork();
> +    if (pid == 0) {
> +        /* child */
> +        int fd;
> +
> +        setsid();
> +        fclose(stdin);
> +        fclose(stdout);
> +        fclose(stderr);
> +
> +        execlp(pmutils_bin, pmutils_bin, NULL);

Strictly speaking, in multi-threaded programs, between fork() and
exec() you are restricted to calling functions which are marked as
async-signal safe in POSIX spec - fclose() is not.

Also, if there was unflushed buffered output on stdout, calling
fclose() in the child will flush that output, but then the parent
process will also flush it some time later, causing duplicated
stdout data.

NB, you might not think qemu-ga is multi-threaded, but depending on
which GLib APIs you're calling, you might find you are in fact using
threads behind the scenes without realizing, so I think it is wise
to be conservative here & assume threads are possible.

Thus you really want to use a plain old 'close()' call, and then
re-open to /dev/null as Eric suggests, leaving stdin/out/err FILE*
alone.

> +
> +        /* 
> +         * The exec call should not return, if it does something went wrong.
> +         * In this case we try to suspend manually if 'mode' is 'hibernate'
> +         */
> +        slog("could not execute %s: %s\n", pmutils_bin, strerror(errno));
> +        slog("trying to suspend using the manual method...\n");
> +
> +        fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
> +        if (fd < 0) {
> +            slog("can't open file %s: %s\n", LINUX_SYS_STATE_FILE,
> +                    strerror(errno));
> +            exit(1);
> +        }
> +
> +        if (write(fd, "disk", 4) < 0) {
> +            slog("can't write to %s: %s\n", LINUX_SYS_STATE_FILE,
> +                    strerror(errno));
> +            exit(1);
> +        }
> +
> +        exit(0);

exit() is also not async-signal safe, because it calls into stdio
to flush  buffers. So you want to use _exit() instead for these.

The impl of slog() doesn't look too safe to me either.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command
  2012-01-05 12:46   ` Daniel P. Berrange
@ 2012-01-05 12:58     ` Luiz Capitulino
  0 siblings, 0 replies; 21+ messages in thread
From: Luiz Capitulino @ 2012-01-05 12:58 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: amit.shah, jcody, qemu-devel, mdroth

On Thu, 5 Jan 2012 12:46:56 +0000
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Wed, Jan 04, 2012 at 05:45:13PM -0200, Luiz Capitulino wrote:
> > diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c
> > index a09c8ca..19f29c6 100644
> > --- a/qga/guest-agent-commands.c
> > +++ b/qga/guest-agent-commands.c
> > @@ -574,6 +574,61 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err)
> >  }
> >  #endif
> >  
> > +#define LINUX_SYS_STATE_FILE "/sys/power/state"
> > +
> > +void qmp_guest_suspend(const char *mode, Error **err)
> > +{
> > +    pid_t pid;
> > +    const char *pmutils_bin;
> > +
> > +    /* TODO implement 'sleep' and 'hybrid' modes once qemu is fixed to
> > +       support them */
> > +    if (strcmp(mode, "hibernate") == 0) {
> > +        pmutils_bin = "pm-hibernate";
> > +    } else {
> > +        error_set(err, QERR_INVALID_PARAMETER, "mode");
> > +        return;
> > +    }
> > +
> > +    pid = fork();
> > +    if (pid == 0) {
> > +        /* child */
> > +        int fd;
> > +
> > +        setsid();
> > +        fclose(stdin);
> > +        fclose(stdout);
> > +        fclose(stderr);
> > +
> > +        execlp(pmutils_bin, pmutils_bin, NULL);
> 
> Strictly speaking, in multi-threaded programs, between fork() and
> exec() you are restricted to calling functions which are marked as
> async-signal safe in POSIX spec - fclose() is not.
> 
> Also, if there was unflushed buffered output on stdout, calling
> fclose() in the child will flush that output, but then the parent
> process will also flush it some time later, causing duplicated
> stdout data.
> 
> NB, you might not think qemu-ga is multi-threaded, but depending on
> which GLib APIs you're calling, you might find you are in fact using
> threads behind the scenes without realizing, so I think it is wise
> to be conservative here & assume threads are possible.

All good points.

> Thus you really want to use a plain old 'close()' call, and then
> re-open to /dev/null as Eric suggests, leaving stdin/out/err FILE*
> alone.

I'm going to use dup2(), which seems to be ok in that regard.

> 
> > +
> > +        /* 
> > +         * The exec call should not return, if it does something went wrong.
> > +         * In this case we try to suspend manually if 'mode' is 'hibernate'
> > +         */
> > +        slog("could not execute %s: %s\n", pmutils_bin, strerror(errno));
> > +        slog("trying to suspend using the manual method...\n");
> > +
> > +        fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
> > +        if (fd < 0) {
> > +            slog("can't open file %s: %s\n", LINUX_SYS_STATE_FILE,
> > +                    strerror(errno));
> > +            exit(1);
> > +        }
> > +
> > +        if (write(fd, "disk", 4) < 0) {
> > +            slog("can't write to %s: %s\n", LINUX_SYS_STATE_FILE,
> > +                    strerror(errno));
> > +            exit(1);
> > +        }
> > +
> > +        exit(0);
> 
> exit() is also not async-signal safe, because it calls into stdio
> to flush  buffers. So you want to use _exit() instead for these.

Ok, I'll change and will fix qmp_guest_shutdown() in a different patch.

> 
> The impl of slog() doesn't look too safe to me either.
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 12:37   ` Luiz Capitulino
@ 2012-01-05 12:59     ` Daniel P. Berrange
  2012-01-05 14:42       ` Luiz Capitulino
  2012-01-05 15:04       ` Michael Roth
  0 siblings, 2 replies; 21+ messages in thread
From: Daniel P. Berrange @ 2012-01-05 12:59 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: amit.shah, jcody, qemu-devel, mdroth

On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
> On Thu, 5 Jan 2012 10:16:30 +0000
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
> > > This version drops modes 'sleep' and 'hybrid' because they don't work
> > > properly due to issues in qemu. Only the 'hibernate' mode is supported
> > > for now.
> > 
> > IMHO this is short-sighted. When the bugs QEMU in are fixed so that
> > these modes work, you have needlessly put users in the situation where
> > they have to now upgrade the guest agent everywhere to take advantage
> > of the bugfix.
> 
> That was my thinking until v4. But after discussing with Michael the issues
> we have with S3 I concluded that it doesn't make sense to offer an API to
> something that doesn't work, this will just generate bug reports. Also,
> updating to get new features is normal and expected.

This is assuming that users will always upgrade their VMs & hosts in
lock step, which I rather doubt they will in practice. eg imagine a
deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
and QEMU 1.2 (working S3). If they build VM disk images they will likely
use the QEMU GA from 1.2 for all their builds, even if many of them
will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
'hybrid' commands available in the guest agent, even though the host
QEMU doesn't work properly.

So you *will* ultimately need to make sure that QEMU GA from 1.2, has
sensible behaviour when run on a QEMU 1.1 host.  If you don't address
this during 1.1, you may well find yourself in an un-winnable situation
for 1.2, where it is impossible to provide good behaviour on old hosts.

So IMHO we are better off in the long run, if we include all commands
right now, even though some don't work yet, and work to ensure we have
good error reporting behaviour for those that don't work.

As an example, if S3 is broken in current QEMU, then we should not be
advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
to return false, at which point the guest agent can send back a nice error
message 'Suspend is not supported on this host', instead of just having the
guest try to suspend & hang or worse.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 12:59     ` Daniel P. Berrange
@ 2012-01-05 14:42       ` Luiz Capitulino
  2012-01-05 15:10         ` Michael Roth
  2012-01-05 15:04       ` Michael Roth
  1 sibling, 1 reply; 21+ messages in thread
From: Luiz Capitulino @ 2012-01-05 14:42 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: amit.shah, jcody, qemu-devel, mdroth

On Thu, 5 Jan 2012 12:59:27 +0000
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
> > On Thu, 5 Jan 2012 10:16:30 +0000
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > 
> > > On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
> > > > This version drops modes 'sleep' and 'hybrid' because they don't work
> > > > properly due to issues in qemu. Only the 'hibernate' mode is supported
> > > > for now.
> > > 
> > > IMHO this is short-sighted. When the bugs QEMU in are fixed so that
> > > these modes work, you have needlessly put users in the situation where
> > > they have to now upgrade the guest agent everywhere to take advantage
> > > of the bugfix.
> > 
> > That was my thinking until v4. But after discussing with Michael the issues
> > we have with S3 I concluded that it doesn't make sense to offer an API to
> > something that doesn't work, this will just generate bug reports. Also,
> > updating to get new features is normal and expected.
> 
> This is assuming that users will always upgrade their VMs & hosts in
> lock step, which I rather doubt they will in practice. eg imagine a
> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
> and QEMU 1.2 (working S3). If they build VM disk images they will likely
> use the QEMU GA from 1.2 for all their builds, even if many of them
> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
> 'hybrid' commands available in the guest agent, even though the host
> QEMU doesn't work properly.
> 
> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
> this during 1.1, you may well find yourself in an un-winnable situation
> for 1.2, where it is impossible to provide good behaviour on old hosts.
> 
> So IMHO we are better off in the long run, if we include all commands
> right now, even though some don't work yet, and work to ensure we have
> good error reporting behaviour for those that don't work.

Yes, I agree. As a side note: if we add error reporting it will only work
on 1.1 and later.  Ie, the problem you describe above will still happen
with 1.0.

But what you're suggesting seems to be the right thing to do. Do you agree
Michael?

> As an example, if S3 is broken in current QEMU, then we should not be
> advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
> to return false, at which point the guest agent can send back a nice error
> message 'Suspend is not supported on this host', instead of just having the
> guest try to suspend & hang or worse.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 12:59     ` Daniel P. Berrange
  2012-01-05 14:42       ` Luiz Capitulino
@ 2012-01-05 15:04       ` Michael Roth
  2012-01-05 15:11         ` Daniel P. Berrange
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Roth @ 2012-01-05 15:04 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: amit.shah, jcody, qemu-devel, Luiz Capitulino

On 01/05/2012 06:59 AM, Daniel P. Berrange wrote:
> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
>> On Thu, 5 Jan 2012 10:16:30 +0000
>> "Daniel P. Berrange"<berrange@redhat.com>  wrote:
>>
>>> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
>>>> This version drops modes 'sleep' and 'hybrid' because they don't work
>>>> properly due to issues in qemu. Only the 'hibernate' mode is supported
>>>> for now.
>>>
>>> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
>>> these modes work, you have needlessly put users in the situation where
>>> they have to now upgrade the guest agent everywhere to take advantage
>>> of the bugfix.
>>
>> That was my thinking until v4. But after discussing with Michael the issues
>> we have with S3 I concluded that it doesn't make sense to offer an API to
>> something that doesn't work, this will just generate bug reports. Also,
>> updating to get new features is normal and expected.
>
> This is assuming that users will always upgrade their VMs&  hosts in
> lock step, which I rather doubt they will in practice. eg imagine a
> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
> and QEMU 1.2 (working S3). If they build VM disk images they will likely
> use the QEMU GA from 1.2 for all their builds, even if many of them
> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
> 'hybrid' commands available in the guest agent, even though the host
> QEMU doesn't work properly.
>
> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
> this during 1.1, you may well find yourself in an un-winnable situation
> for 1.2, where it is impossible to provide good behaviour on old hosts.
>
> So IMHO we are better off in the long run, if we include all commands
> right now, even though some don't work yet, and work to ensure we have
> good error reporting behaviour for those that don't work.
>
> As an example, if S3 is broken in current QEMU, then we should not be
> advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
> to return false, at which point the guest agent can send back a nice error
> message 'Suspend is not supported on this host', instead of just having the
> guest try to suspend&  hang or worse.

This still requires we're lockstep with host QEMU (ideally that would be 
the case via push-deployment of the agent, exactly because of issues 
like this. Or at least, it'd make the upgrade process painless). And 
outside of that, I really cannot think of any nice way to check, from 
the agent, that a host has required functionality for {this,an} RPC. Not 
unless we forced a bi-directional capabilities negotiation sequence, and 
I don't like the idea of injecting this kind of data into a guest. 
libvirt could maybe filter the modes based on QEMU version, but that's 
not the only consumer of the agent.

Really I think this is a case study for why push-deployment of agents is 
the way to go. QEMU could query qemu-ga directly and generate an 'agent 
update available' event that users/frontends can use to prompt an update 
to the latest version. Then all the upgrade inertia involved with saving 
code/features for subsequent agent versions is mostly gone, and we can 
"do the right thing" with regard to broken functionality :)

Unfortunately that option isn't available yet. But it just seems wrong 
to introduce something we know is broken, to the extent that even those 
involved with it's development aren't currently capable of testing it 
fully. I mean, we know what the user expectations are, and it's not 
that, unfortunately for us :( I'd be more open to it if the bug wasn't 
so bad, but nuking your guest's working state every time you make the 
mistake of hitting the pretty "sleep" button in virt-manager or whatever 
is pretty bad.

>
> Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 14:42       ` Luiz Capitulino
@ 2012-01-05 15:10         ` Michael Roth
  2012-01-05 20:25           ` Luiz Capitulino
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Roth @ 2012-01-05 15:10 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: amit.shah, jcody, qemu-devel

On 01/05/2012 08:42 AM, Luiz Capitulino wrote:
> On Thu, 5 Jan 2012 12:59:27 +0000
> "Daniel P. Berrange"<berrange@redhat.com>  wrote:
>
>> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
>>> On Thu, 5 Jan 2012 10:16:30 +0000
>>> "Daniel P. Berrange"<berrange@redhat.com>  wrote:
>>>
>>>> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
>>>>> This version drops modes 'sleep' and 'hybrid' because they don't work
>>>>> properly due to issues in qemu. Only the 'hibernate' mode is supported
>>>>> for now.
>>>>
>>>> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
>>>> these modes work, you have needlessly put users in the situation where
>>>> they have to now upgrade the guest agent everywhere to take advantage
>>>> of the bugfix.
>>>
>>> That was my thinking until v4. But after discussing with Michael the issues
>>> we have with S3 I concluded that it doesn't make sense to offer an API to
>>> something that doesn't work, this will just generate bug reports. Also,
>>> updating to get new features is normal and expected.
>>
>> This is assuming that users will always upgrade their VMs&  hosts in
>> lock step, which I rather doubt they will in practice. eg imagine a
>> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
>> and QEMU 1.2 (working S3). If they build VM disk images they will likely
>> use the QEMU GA from 1.2 for all their builds, even if many of them
>> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
>> 'hybrid' commands available in the guest agent, even though the host
>> QEMU doesn't work properly.
>>
>> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
>> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
>> this during 1.1, you may well find yourself in an un-winnable situation
>> for 1.2, where it is impossible to provide good behaviour on old hosts.
>>
>> So IMHO we are better off in the long run, if we include all commands
>> right now, even though some don't work yet, and work to ensure we have
>> good error reporting behaviour for those that don't work.
>
> Yes, I agree. As a side note: if we add error reporting it will only work
> on 1.1 and later.  Ie, the problem you describe above will still happen
> with 1.0.
>
> But what you're suggesting seems to be the right thing to do. Do you agree
> Michael?

Agree, but unless we add an RPC that QEMU uses to advertise 
capabilities, I'm really not sure it's possible to detect whether or not 
the host will support it. And if we can't detect that reliably, we're 
better off leaving it out for now, because sleeping guests is not 
obscure functionality, and accidentally nuking guests when a user sleeps 
them (presumably because they want to retain their working state) is 
much worse than telling a user to upgrade their agent, or not supported 
or whatever.

>
>> As an example, if S3 is broken in current QEMU, then we should not be
>> advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
>> to return false, at which point the guest agent can send back a nice error
>> message 'Suspend is not supported on this host', instead of just having the
>> guest try to suspend&  hang or worse.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 15:04       ` Michael Roth
@ 2012-01-05 15:11         ` Daniel P. Berrange
  2012-01-05 15:18           ` Michael Roth
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel P. Berrange @ 2012-01-05 15:11 UTC (permalink / raw)
  To: Michael Roth; +Cc: amit.shah, jcody, qemu-devel, Luiz Capitulino

On Thu, Jan 05, 2012 at 09:04:57AM -0600, Michael Roth wrote:
> On 01/05/2012 06:59 AM, Daniel P. Berrange wrote:
> >On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
> >>On Thu, 5 Jan 2012 10:16:30 +0000
> >>"Daniel P. Berrange"<berrange@redhat.com>  wrote:
> >>
> >>>On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
> >>>>This version drops modes 'sleep' and 'hybrid' because they don't work
> >>>>properly due to issues in qemu. Only the 'hibernate' mode is supported
> >>>>for now.
> >>>
> >>>IMHO this is short-sighted. When the bugs QEMU in are fixed so that
> >>>these modes work, you have needlessly put users in the situation where
> >>>they have to now upgrade the guest agent everywhere to take advantage
> >>>of the bugfix.
> >>
> >>That was my thinking until v4. But after discussing with Michael the issues
> >>we have with S3 I concluded that it doesn't make sense to offer an API to
> >>something that doesn't work, this will just generate bug reports. Also,
> >>updating to get new features is normal and expected.
> >
> >This is assuming that users will always upgrade their VMs&  hosts in
> >lock step, which I rather doubt they will in practice. eg imagine a
> >deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
> >and QEMU 1.2 (working S3). If they build VM disk images they will likely
> >use the QEMU GA from 1.2 for all their builds, even if many of them
> >will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
> >'hybrid' commands available in the guest agent, even though the host
> >QEMU doesn't work properly.
> >
> >So you *will* ultimately need to make sure that QEMU GA from 1.2, has
> >sensible behaviour when run on a QEMU 1.1 host.  If you don't address
> >this during 1.1, you may well find yourself in an un-winnable situation
> >for 1.2, where it is impossible to provide good behaviour on old hosts.
> >
> >So IMHO we are better off in the long run, if we include all commands
> >right now, even though some don't work yet, and work to ensure we have
> >good error reporting behaviour for those that don't work.
> >
> >As an example, if S3 is broken in current QEMU, then we should not be
> >advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
> >to return false, at which point the guest agent can send back a nice error
> >message 'Suspend is not supported on this host', instead of just having the
> >guest try to suspend&  hang or worse.
> 
> This still requires we're lockstep with host QEMU (ideally that
> would be the case via push-deployment of the agent, exactly because
> of issues like this. Or at least, it'd make the upgrade process
> painless). And outside of that, I really cannot think of any nice
> way to check, from the agent, that a host has required functionality
> for {this,an} RPC. Not unless we forced a bi-directional
> capabilities negotiation sequence, and I don't like the idea of
> injecting this kind of data into a guest. libvirt could maybe filter
> the modes based on QEMU version, but that's not the only consumer of
> the agent.

Err, the scenario I just described does not require lockstep
upgrade. Newer QEMU GA agent should be able to run on historical
QEMU hosts just fine. I'm also not trying to suggest we need a
general bi-directional capabilities negotiation here either.
The key is that in this particular case, QEMU should only
expose S3 to the guest if it is actually capable of working.
Then, the pm-is-supported  command will 'just work'. No
host<->guest agent negoiation is required.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 15:11         ` Daniel P. Berrange
@ 2012-01-05 15:18           ` Michael Roth
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Roth @ 2012-01-05 15:18 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: amit.shah, jcody, qemu-devel, Luiz Capitulino

On 01/05/2012 09:11 AM, Daniel P. Berrange wrote:
> On Thu, Jan 05, 2012 at 09:04:57AM -0600, Michael Roth wrote:
>> On 01/05/2012 06:59 AM, Daniel P. Berrange wrote:
>>> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
>>>> On Thu, 5 Jan 2012 10:16:30 +0000
>>>> "Daniel P. Berrange"<berrange@redhat.com>   wrote:
>>>>
>>>>> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
>>>>>> This version drops modes 'sleep' and 'hybrid' because they don't work
>>>>>> properly due to issues in qemu. Only the 'hibernate' mode is supported
>>>>>> for now.
>>>>>
>>>>> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
>>>>> these modes work, you have needlessly put users in the situation where
>>>>> they have to now upgrade the guest agent everywhere to take advantage
>>>>> of the bugfix.
>>>>
>>>> That was my thinking until v4. But after discussing with Michael the issues
>>>> we have with S3 I concluded that it doesn't make sense to offer an API to
>>>> something that doesn't work, this will just generate bug reports. Also,
>>>> updating to get new features is normal and expected.
>>>
>>> This is assuming that users will always upgrade their VMs&   hosts in
>>> lock step, which I rather doubt they will in practice. eg imagine a
>>> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
>>> and QEMU 1.2 (working S3). If they build VM disk images they will likely
>>> use the QEMU GA from 1.2 for all their builds, even if many of them
>>> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
>>> 'hybrid' commands available in the guest agent, even though the host
>>> QEMU doesn't work properly.
>>>
>>> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
>>> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
>>> this during 1.1, you may well find yourself in an un-winnable situation
>>> for 1.2, where it is impossible to provide good behaviour on old hosts.
>>>
>>> So IMHO we are better off in the long run, if we include all commands
>>> right now, even though some don't work yet, and work to ensure we have
>>> good error reporting behaviour for those that don't work.
>>>
>>> As an example, if S3 is broken in current QEMU, then we should not be
>>> advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
>>> to return false, at which point the guest agent can send back a nice error
>>> message 'Suspend is not supported on this host', instead of just having the
>>> guest try to suspend&   hang or worse.
>>
>> This still requires we're lockstep with host QEMU (ideally that
>> would be the case via push-deployment of the agent, exactly because
>> of issues like this. Or at least, it'd make the upgrade process
>> painless). And outside of that, I really cannot think of any nice
>> way to check, from the agent, that a host has required functionality
>> for {this,an} RPC. Not unless we forced a bi-directional
>> capabilities negotiation sequence, and I don't like the idea of
>> injecting this kind of data into a guest. libvirt could maybe filter
>> the modes based on QEMU version, but that's not the only consumer of
>> the agent.
>
> Err, the scenario I just described does not require lockstep
> upgrade. Newer QEMU GA agent should be able to run on historical
> QEMU hosts just fine. I'm also not trying to suggest we need a

Bad terminology on my part, what I mean is if qemu-ga error reporting 
requires a newer qemu, we still execute the sleep on buggy hosts unless 
the host-level is adequate.

> general bi-directional capabilities negotiation here either.
> The key is that in this particular case, QEMU should only
> expose S3 to the guest if it is actually capable of working.
> Then, the pm-is-supported  command will 'just work'. No
> host<->guest agent negoiation is required.
>
> Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 15:10         ` Michael Roth
@ 2012-01-05 20:25           ` Luiz Capitulino
  2012-01-05 21:41             ` Michael Roth
  0 siblings, 1 reply; 21+ messages in thread
From: Luiz Capitulino @ 2012-01-05 20:25 UTC (permalink / raw)
  To: Michael Roth; +Cc: amit.shah, jcody, qemu-devel

On Thu, 05 Jan 2012 09:10:50 -0600
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> On 01/05/2012 08:42 AM, Luiz Capitulino wrote:
> > On Thu, 5 Jan 2012 12:59:27 +0000
> > "Daniel P. Berrange"<berrange@redhat.com>  wrote:
> >
> >> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
> >>> On Thu, 5 Jan 2012 10:16:30 +0000
> >>> "Daniel P. Berrange"<berrange@redhat.com>  wrote:
> >>>
> >>>> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
> >>>>> This version drops modes 'sleep' and 'hybrid' because they don't work
> >>>>> properly due to issues in qemu. Only the 'hibernate' mode is supported
> >>>>> for now.
> >>>>
> >>>> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
> >>>> these modes work, you have needlessly put users in the situation where
> >>>> they have to now upgrade the guest agent everywhere to take advantage
> >>>> of the bugfix.
> >>>
> >>> That was my thinking until v4. But after discussing with Michael the issues
> >>> we have with S3 I concluded that it doesn't make sense to offer an API to
> >>> something that doesn't work, this will just generate bug reports. Also,
> >>> updating to get new features is normal and expected.
> >>
> >> This is assuming that users will always upgrade their VMs&  hosts in
> >> lock step, which I rather doubt they will in practice. eg imagine a
> >> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
> >> and QEMU 1.2 (working S3). If they build VM disk images they will likely
> >> use the QEMU GA from 1.2 for all their builds, even if many of them
> >> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
> >> 'hybrid' commands available in the guest agent, even though the host
> >> QEMU doesn't work properly.
> >>
> >> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
> >> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
> >> this during 1.1, you may well find yourself in an un-winnable situation
> >> for 1.2, where it is impossible to provide good behaviour on old hosts.
> >>
> >> So IMHO we are better off in the long run, if we include all commands
> >> right now, even though some don't work yet, and work to ensure we have
> >> good error reporting behaviour for those that don't work.
> >
> > Yes, I agree. As a side note: if we add error reporting it will only work
> > on 1.1 and later.  Ie, the problem you describe above will still happen
> > with 1.0.
> >
> > But what you're suggesting seems to be the right thing to do. Do you agree
> > Michael?
> 
> Agree, but unless we add an RPC that QEMU uses to advertise 
> capabilities, I'm really not sure it's possible to detect whether or not 
> the host will support it.

You mean an RPC to advertise if 'sleep' is supported? I think this is best done
by making guest-suspend return an error as suggested by Daniel, otherwise a
client that doesn't query for capabilities might run in trouble.

There's an important detail though: we need to make qemu not advertise S3 for
this to work. However, we might be able to fix S3 for 1.1 (and bugs, like the
S4 ones, can't be detected, limiting the scope of the 'unsupported' error).

So, we could merge all modes and commit to get S3 fixed for 1.1 :)

> And if we can't detect that reliably, we're 
> better off leaving it out for now, because sleeping guests is not 
> obscure functionality, and accidentally nuking guests when a user sleeps 
> them (presumably because they want to retain their working state) is 
> much worse than telling a user to upgrade their agent, or not supported 
> or whatever.
> 
> >
> >> As an example, if S3 is broken in current QEMU, then we should not be
> >> advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
> >> to return false, at which point the guest agent can send back a nice error
> >> message 'Suspend is not supported on this host', instead of just having the
> >> guest try to suspend&  hang or worse.
> >
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 20:25           ` Luiz Capitulino
@ 2012-01-05 21:41             ` Michael Roth
  2012-01-06 19:04               ` Luiz Capitulino
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Roth @ 2012-01-05 21:41 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: amit.shah, jcody, qemu-devel

On 01/05/2012 02:25 PM, Luiz Capitulino wrote:
> On Thu, 05 Jan 2012 09:10:50 -0600
> Michael Roth<mdroth@linux.vnet.ibm.com>  wrote:
>
>> On 01/05/2012 08:42 AM, Luiz Capitulino wrote:
>>> On Thu, 5 Jan 2012 12:59:27 +0000
>>> "Daniel P. Berrange"<berrange@redhat.com>   wrote:
>>>
>>>> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
>>>>> On Thu, 5 Jan 2012 10:16:30 +0000
>>>>> "Daniel P. Berrange"<berrange@redhat.com>   wrote:
>>>>>
>>>>>> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
>>>>>>> This version drops modes 'sleep' and 'hybrid' because they don't work
>>>>>>> properly due to issues in qemu. Only the 'hibernate' mode is supported
>>>>>>> for now.
>>>>>>
>>>>>> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
>>>>>> these modes work, you have needlessly put users in the situation where
>>>>>> they have to now upgrade the guest agent everywhere to take advantage
>>>>>> of the bugfix.
>>>>>
>>>>> That was my thinking until v4. But after discussing with Michael the issues
>>>>> we have with S3 I concluded that it doesn't make sense to offer an API to
>>>>> something that doesn't work, this will just generate bug reports. Also,
>>>>> updating to get new features is normal and expected.
>>>>
>>>> This is assuming that users will always upgrade their VMs&   hosts in
>>>> lock step, which I rather doubt they will in practice. eg imagine a
>>>> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
>>>> and QEMU 1.2 (working S3). If they build VM disk images they will likely
>>>> use the QEMU GA from 1.2 for all their builds, even if many of them
>>>> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
>>>> 'hybrid' commands available in the guest agent, even though the host
>>>> QEMU doesn't work properly.
>>>>
>>>> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
>>>> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
>>>> this during 1.1, you may well find yourself in an un-winnable situation
>>>> for 1.2, where it is impossible to provide good behaviour on old hosts.
>>>>
>>>> So IMHO we are better off in the long run, if we include all commands
>>>> right now, even though some don't work yet, and work to ensure we have
>>>> good error reporting behaviour for those that don't work.
>>>
>>> Yes, I agree. As a side note: if we add error reporting it will only work
>>> on 1.1 and later.  Ie, the problem you describe above will still happen
>>> with 1.0.
>>>
>>> But what you're suggesting seems to be the right thing to do. Do you agree
>>> Michael?
>>
>> Agree, but unless we add an RPC that QEMU uses to advertise
>> capabilities, I'm really not sure it's possible to detect whether or not
>> the host will support it.
>
> You mean an RPC to advertise if 'sleep' is supported? I think this is best done
> by making guest-suspend return an error as suggested by Daniel, otherwise a
> client that doesn't query for capabilities might run in trouble.

Agreed, but what I mean is that if the user executes the suspend using 
on up-level agent running on a down-level 1.0 host, the agent will still 
see s3 advertised and issue the buggy suspend. That's why I suggested 
the host->agent capabilities reporting as a possible (but somewhat ugly) 
way to just simply tell the agent it can handle it (and, lacking that, 
assume that it can't).

>
> There's an important detail though: we need to make qemu not advertise S3 for
> this to work. However, we might be able to fix S3 for 1.1 (and bugs, like the
> S4 ones, can't be detected, limiting the scope of the 'unsupported' error).
>
> So, we could merge all modes and commit to get S3 fixed for 1.1 :)

No disagreement there, if we can commit to making qemu-ga/qemu 1.1 
releases interoperable in this manner (whether by fixing s3 or not 
advertising it), I think that approach is perfectly fine, ideal even. 
Doing a 1.1 release where qemu and qemu-ga are not interoperable (qemu 
missing s3 support, qemu-ga using s3) was my main objection.

But there is a 2nd topic here I'm trying to mull over: what is qemu-ga's 
support policy for down-level hosts? backward-compatible? incompatible?

The above approach to this problem suggests the latter (qemu-ga 1.1 has 
RPCs that will knowingly break 1.0 qemu instances). We could solve this 
by introducing the capabilities negotiation I mentioned early. It 
actually wouldn't need to be anything other than qemu telling qemu-ga 
what qemu-ga version-level it supports. By default we assume 1.0, and 
limit qemu-ga to that until qemu-ga is told otherwise (so, no 
sleep/hybrid suspend modes). For new RPCs we may be able to handle this 
version automatically, since we include qemu version levels for the RPCs 
in the schema. For functionality within an RPC (like sleep/hybrid 
suspend modes) we could use conditional code.

If we take that approach (maintaining backward-compatibility), we'd need 
to introduce that code in the agent now, and require qemu/libvirt 
execute the guest-set-support-level RPC or whatever to access these 1.1 
features.

Technically, there's a required RPC qemu-ga clients need to execute 
already: guest-sync. It's required because we have no way to reliably 
detect EOF over virtio-serial, and thus an agent may send stale data to 
a newly-connected qemu-ga client, so the client needs to do the 
guest-sync command to find the expected response and re-sync the 
streams. We could roll the guest-set-support-level functionality into 
that. Basically just add another field.

>
>> And if we can't detect that reliably, we're
>> better off leaving it out for now, because sleeping guests is not
>> obscure functionality, and accidentally nuking guests when a user sleeps
>> them (presumably because they want to retain their working state) is
>> much worse than telling a user to upgrade their agent, or not supported
>> or whatever.
>>
>>>
>>>> As an example, if S3 is broken in current QEMU, then we should not be
>>>> advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
>>>> to return false, at which point the guest agent can send back a nice error
>>>> message 'Suspend is not supported on this host', instead of just having the
>>>> guest try to suspend&   hang or worse.
>>>
>>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-05 21:41             ` Michael Roth
@ 2012-01-06 19:04               ` Luiz Capitulino
  2012-01-06 21:03                 ` Michael Roth
  0 siblings, 1 reply; 21+ messages in thread
From: Luiz Capitulino @ 2012-01-06 19:04 UTC (permalink / raw)
  To: Michael Roth; +Cc: amit.shah, jcody, qemu-devel

On Thu, 05 Jan 2012 15:41:33 -0600
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> On 01/05/2012 02:25 PM, Luiz Capitulino wrote:
> > On Thu, 05 Jan 2012 09:10:50 -0600
> > Michael Roth<mdroth@linux.vnet.ibm.com>  wrote:
> >
> >> On 01/05/2012 08:42 AM, Luiz Capitulino wrote:
> >>> On Thu, 5 Jan 2012 12:59:27 +0000
> >>> "Daniel P. Berrange"<berrange@redhat.com>   wrote:
> >>>
> >>>> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
> >>>>> On Thu, 5 Jan 2012 10:16:30 +0000
> >>>>> "Daniel P. Berrange"<berrange@redhat.com>   wrote:
> >>>>>
> >>>>>> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
> >>>>>>> This version drops modes 'sleep' and 'hybrid' because they don't work
> >>>>>>> properly due to issues in qemu. Only the 'hibernate' mode is supported
> >>>>>>> for now.
> >>>>>>
> >>>>>> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
> >>>>>> these modes work, you have needlessly put users in the situation where
> >>>>>> they have to now upgrade the guest agent everywhere to take advantage
> >>>>>> of the bugfix.
> >>>>>
> >>>>> That was my thinking until v4. But after discussing with Michael the issues
> >>>>> we have with S3 I concluded that it doesn't make sense to offer an API to
> >>>>> something that doesn't work, this will just generate bug reports. Also,
> >>>>> updating to get new features is normal and expected.
> >>>>
> >>>> This is assuming that users will always upgrade their VMs&   hosts in
> >>>> lock step, which I rather doubt they will in practice. eg imagine a
> >>>> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
> >>>> and QEMU 1.2 (working S3). If they build VM disk images they will likely
> >>>> use the QEMU GA from 1.2 for all their builds, even if many of them
> >>>> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
> >>>> 'hybrid' commands available in the guest agent, even though the host
> >>>> QEMU doesn't work properly.
> >>>>
> >>>> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
> >>>> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
> >>>> this during 1.1, you may well find yourself in an un-winnable situation
> >>>> for 1.2, where it is impossible to provide good behaviour on old hosts.
> >>>>
> >>>> So IMHO we are better off in the long run, if we include all commands
> >>>> right now, even though some don't work yet, and work to ensure we have
> >>>> good error reporting behaviour for those that don't work.
> >>>
> >>> Yes, I agree. As a side note: if we add error reporting it will only work
> >>> on 1.1 and later.  Ie, the problem you describe above will still happen
> >>> with 1.0.
> >>>
> >>> But what you're suggesting seems to be the right thing to do. Do you agree
> >>> Michael?
> >>
> >> Agree, but unless we add an RPC that QEMU uses to advertise
> >> capabilities, I'm really not sure it's possible to detect whether or not
> >> the host will support it.
> >
> > You mean an RPC to advertise if 'sleep' is supported? I think this is best done
> > by making guest-suspend return an error as suggested by Daniel, otherwise a
> > client that doesn't query for capabilities might run in trouble.
> 
> Agreed, but what I mean is that if the user executes the suspend using 
> on up-level agent running on a down-level 1.0 host, the agent will still 
> see s3 advertised and issue the buggy suspend. That's why I suggested 
> the host->agent capabilities reporting as a possible (but somewhat ugly) 
> way to just simply tell the agent it can handle it (and, lacking that, 
> assume that it can't).

That makes sense.

> 
> >
> > There's an important detail though: we need to make qemu not advertise S3 for
> > this to work. However, we might be able to fix S3 for 1.1 (and bugs, like the
> > S4 ones, can't be detected, limiting the scope of the 'unsupported' error).
> >
> > So, we could merge all modes and commit to get S3 fixed for 1.1 :)
> 
> No disagreement there, if we can commit to making qemu-ga/qemu 1.1 
> releases interoperable in this manner (whether by fixing s3 or not 
> advertising it), I think that approach is perfectly fine, ideal even. 
> Doing a 1.1 release where qemu and qemu-ga are not interoperable (qemu 
> missing s3 support, qemu-ga using s3) was my main objection.

I see.

> But there is a 2nd topic here I'm trying to mull over: what is qemu-ga's 
> support policy for down-level hosts? backward-compatible? incompatible?

That's a good question, I think we should be backward-compatible, but I think
that's not going to be trivial.

> The above approach to this problem suggests the latter (qemu-ga 1.1 has 
> RPCs that will knowingly break 1.0 qemu instances). We could solve this 
> by introducing the capabilities negotiation I mentioned early. It 
> actually wouldn't need to be anything other than qemu telling qemu-ga 
> what qemu-ga version-level it supports. By default we assume 1.0, and 
> limit qemu-ga to that until qemu-ga is told otherwise (so, no 
> sleep/hybrid suspend modes). For new RPCs we may be able to handle this 
> version automatically, since we include qemu version levels for the RPCs 
> in the schema. For functionality within an RPC (like sleep/hybrid 
> suspend modes) we could use conditional code.
> 
> If we take that approach (maintaining backward-compatibility), we'd need 
> to introduce that code in the agent now, and require qemu/libvirt 
> execute the guest-set-support-level RPC or whatever to access these 1.1 
> features.

What does guest-set-support-level do? It enables all 1.1 post features?

A different approach would be to add a new field in the command dict in
the schema file, say 'broken-in-qemu-version', and change qemu-ga to check
that field in its main loop before executing a command. If
'broken-in-qemu-version' <= qemu version qemu-ga returns an not supported
error.

For commands like the guest-suspend which is partially supported, we'd have
to do a manual check for the qemu version as you suggested above.

That's just an idea though, I'm not sure what's the best way to do this.

> 
> Technically, there's a required RPC qemu-ga clients need to execute 
> already: guest-sync. It's required because we have no way to reliably 
> detect EOF over virtio-serial, and thus an agent may send stale data to 
> a newly-connected qemu-ga client, so the client needs to do the 
> guest-sync command to find the expected response and re-sync the 
> streams. We could roll the guest-set-support-level functionality into 
> that. Basically just add another field.
> 
> >
> >> And if we can't detect that reliably, we're
> >> better off leaving it out for now, because sleeping guests is not
> >> obscure functionality, and accidentally nuking guests when a user sleeps
> >> them (presumably because they want to retain their working state) is
> >> much worse than telling a user to upgrade their agent, or not supported
> >> or whatever.
> >>
> >>>
> >>>> As an example, if S3 is broken in current QEMU, then we should not be
> >>>> advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
> >>>> to return false, at which point the guest agent can send back a nice error
> >>>> message 'Suspend is not supported on this host', instead of just having the
> >>>> guest try to suspend&   hang or worse.
> >>>
> >>
> >
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
  2012-01-06 19:04               ` Luiz Capitulino
@ 2012-01-06 21:03                 ` Michael Roth
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Roth @ 2012-01-06 21:03 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: amit.shah, jcody, qemu-devel

On 01/06/2012 01:04 PM, Luiz Capitulino wrote:
> On Thu, 05 Jan 2012 15:41:33 -0600
> Michael Roth<mdroth@linux.vnet.ibm.com>  wrote:
>
>> On 01/05/2012 02:25 PM, Luiz Capitulino wrote:
>>> On Thu, 05 Jan 2012 09:10:50 -0600
>>> Michael Roth<mdroth@linux.vnet.ibm.com>   wrote:
>>>
>>>> On 01/05/2012 08:42 AM, Luiz Capitulino wrote:
>>>>> On Thu, 5 Jan 2012 12:59:27 +0000
>>>>> "Daniel P. Berrange"<berrange@redhat.com>    wrote:
>>>>>
>>>>>> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
>>>>>>> On Thu, 5 Jan 2012 10:16:30 +0000
>>>>>>> "Daniel P. Berrange"<berrange@redhat.com>    wrote:
>>>>>>>
>>>>>>>> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
>>>>>>>>> This version drops modes 'sleep' and 'hybrid' because they don't work
>>>>>>>>> properly due to issues in qemu. Only the 'hibernate' mode is supported
>>>>>>>>> for now.
>>>>>>>>
>>>>>>>> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
>>>>>>>> these modes work, you have needlessly put users in the situation where
>>>>>>>> they have to now upgrade the guest agent everywhere to take advantage
>>>>>>>> of the bugfix.
>>>>>>>
>>>>>>> That was my thinking until v4. But after discussing with Michael the issues
>>>>>>> we have with S3 I concluded that it doesn't make sense to offer an API to
>>>>>>> something that doesn't work, this will just generate bug reports. Also,
>>>>>>> updating to get new features is normal and expected.
>>>>>>
>>>>>> This is assuming that users will always upgrade their VMs&    hosts in
>>>>>> lock step, which I rather doubt they will in practice. eg imagine a
>>>>>> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
>>>>>> and QEMU 1.2 (working S3). If they build VM disk images they will likely
>>>>>> use the QEMU GA from 1.2 for all their builds, even if many of them
>>>>>> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
>>>>>> 'hybrid' commands available in the guest agent, even though the host
>>>>>> QEMU doesn't work properly.
>>>>>>
>>>>>> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
>>>>>> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
>>>>>> this during 1.1, you may well find yourself in an un-winnable situation
>>>>>> for 1.2, where it is impossible to provide good behaviour on old hosts.
>>>>>>
>>>>>> So IMHO we are better off in the long run, if we include all commands
>>>>>> right now, even though some don't work yet, and work to ensure we have
>>>>>> good error reporting behaviour for those that don't work.
>>>>>
>>>>> Yes, I agree. As a side note: if we add error reporting it will only work
>>>>> on 1.1 and later.  Ie, the problem you describe above will still happen
>>>>> with 1.0.
>>>>>
>>>>> But what you're suggesting seems to be the right thing to do. Do you agree
>>>>> Michael?
>>>>
>>>> Agree, but unless we add an RPC that QEMU uses to advertise
>>>> capabilities, I'm really not sure it's possible to detect whether or not
>>>> the host will support it.
>>>
>>> You mean an RPC to advertise if 'sleep' is supported? I think this is best done
>>> by making guest-suspend return an error as suggested by Daniel, otherwise a
>>> client that doesn't query for capabilities might run in trouble.
>>
>> Agreed, but what I mean is that if the user executes the suspend using
>> on up-level agent running on a down-level 1.0 host, the agent will still
>> see s3 advertised and issue the buggy suspend. That's why I suggested
>> the host->agent capabilities reporting as a possible (but somewhat ugly)
>> way to just simply tell the agent it can handle it (and, lacking that,
>> assume that it can't).
>
> That makes sense.
>
>>
>>>
>>> There's an important detail though: we need to make qemu not advertise S3 for
>>> this to work. However, we might be able to fix S3 for 1.1 (and bugs, like the
>>> S4 ones, can't be detected, limiting the scope of the 'unsupported' error).
>>>
>>> So, we could merge all modes and commit to get S3 fixed for 1.1 :)
>>
>> No disagreement there, if we can commit to making qemu-ga/qemu 1.1
>> releases interoperable in this manner (whether by fixing s3 or not
>> advertising it), I think that approach is perfectly fine, ideal even.
>> Doing a 1.1 release where qemu and qemu-ga are not interoperable (qemu
>> missing s3 support, qemu-ga using s3) was my main objection.
>
> I see.
>
>> But there is a 2nd topic here I'm trying to mull over: what is qemu-ga's
>> support policy for down-level hosts? backward-compatible? incompatible?
>
> That's a good question, I think we should be backward-compatible, but I think
> that's not going to be trivial.
>
>> The above approach to this problem suggests the latter (qemu-ga 1.1 has
>> RPCs that will knowingly break 1.0 qemu instances). We could solve this
>> by introducing the capabilities negotiation I mentioned early. It
>> actually wouldn't need to be anything other than qemu telling qemu-ga
>> what qemu-ga version-level it supports. By default we assume 1.0, and
>> limit qemu-ga to that until qemu-ga is told otherwise (so, no
>> sleep/hybrid suspend modes). For new RPCs we may be able to handle this
>> version automatically, since we include qemu version levels for the RPCs
>> in the schema. For functionality within an RPC (like sleep/hybrid
>> suspend modes) we could use conditional code.
>>
>> If we take that approach (maintaining backward-compatibility), we'd need
>> to introduce that code in the agent now, and require qemu/libvirt
>> execute the guest-set-support-level RPC or whatever to access these 1.1
>> features.
>
> What does guest-set-support-level do? It enables all 1.1 post features?

Well, that was my initial thought (we set host version level N, all 
RPCs/fields introduced after N are made unavailable). But if we added, 
say, a new optional parameter or RPC that wasn't dependent on a 
particular QEMU version, there's no reason to hide them from host 
programs higher up the stack (which may be aware of the new features, 
but are paired with older QEMU versions for whatever reason and so can't 
bump the support level above 1.0 without risking breakage for other stuff).

So, guest-set-support-level(N) enables all features that were marked as 
requiring QEMU version N. New features with no such dependencies 
(optional params, new RPCs) would be unguarded/enabled by default.

>
> A different approach would be to add a new field in the command dict in
> the schema file, say 'broken-in-qemu-version', and change qemu-ga to check
> that field in its main loop before executing a command. If
> 'broken-in-qemu-version'<= qemu version qemu-ga returns an not supported
> error.

Yah, still not sure what the best way to implement the check is. Though, 
I'd prefer the "positive" approach: 'requires[-at-least]-qemu-version'.

>
> For commands like the guest-suspend which is partially supported, we'd have
> to do a manual check for the qemu version as you suggested above.

Agreed, and just document qemu version dependencies in the schema. That 
may a reasonable approach for the above as well: if we introduce an RPC 
that requires a certain qemu version we just stick a version check at 
the beginning and bail if it fails. We could always get fancy with it 
later. Would make it easier to include this data in guest-info though... 
I look at it more and whip up a patch soon.

>
> That's just an idea though, I'm not sure what's the best way to do this.
>
>>
>> Technically, there's a required RPC qemu-ga clients need to execute
>> already: guest-sync. It's required because we have no way to reliably
>> detect EOF over virtio-serial, and thus an agent may send stale data to
>> a newly-connected qemu-ga client, so the client needs to do the
>> guest-sync command to find the expected response and re-sync the
>> streams. We could roll the guest-set-support-level functionality into
>> that. Basically just add another field.
>>
>>>
>>>> And if we can't detect that reliably, we're
>>>> better off leaving it out for now, because sleeping guests is not
>>>> obscure functionality, and accidentally nuking guests when a user sleeps
>>>> them (presumably because they want to retain their working state) is
>>>> much worse than telling a user to upgrade their agent, or not supported
>>>> or whatever.
>>>>
>>>>>
>>>>>> As an example, if S3 is broken in current QEMU, then we should not be
>>>>>> advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
>>>>>> to return false, at which point the guest agent can send back a nice error
>>>>>> message 'Suspend is not supported on this host', instead of just having the
>>>>>> guest try to suspend&    hang or worse.
>>>>>
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-01-06 21:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-04 19:45 [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command Luiz Capitulino
2012-01-04 19:45 ` [Qemu-devel] [PATCH 1/2] qemu-ga: set O_NONBLOCK for serial channels Luiz Capitulino
2012-01-04 19:55   ` Michael Roth
2012-01-04 19:45 ` [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command Luiz Capitulino
2012-01-04 20:00   ` Michael Roth
2012-01-04 20:03   ` Eric Blake
2012-01-05 12:29     ` Luiz Capitulino
2012-01-05 12:46   ` Daniel P. Berrange
2012-01-05 12:58     ` Luiz Capitulino
2012-01-05 10:16 ` [Qemu-devel] [PATCH v4 0/2]: " Daniel P. Berrange
2012-01-05 12:37   ` Luiz Capitulino
2012-01-05 12:59     ` Daniel P. Berrange
2012-01-05 14:42       ` Luiz Capitulino
2012-01-05 15:10         ` Michael Roth
2012-01-05 20:25           ` Luiz Capitulino
2012-01-05 21:41             ` Michael Roth
2012-01-06 19:04               ` Luiz Capitulino
2012-01-06 21:03                 ` Michael Roth
2012-01-05 15:04       ` Michael Roth
2012-01-05 15:11         ` Daniel P. Berrange
2012-01-05 15:18           ` Michael Roth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).