From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:44288) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rb06h-0000Rf-5q for qemu-devel@nongnu.org; Wed, 14 Dec 2011 20:27:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Rb06W-00038Q-57 for qemu-devel@nongnu.org; Wed, 14 Dec 2011 20:27:35 -0500 Received: from e38.co.us.ibm.com ([32.97.110.159]:41510) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rb06V-00038D-RM for qemu-devel@nongnu.org; Wed, 14 Dec 2011 20:27:24 -0500 Received: from /spool/local by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 14 Dec 2011 18:27:22 -0700 Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pBF1RKA3128466 for ; Wed, 14 Dec 2011 18:27:20 -0700 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pBF1RK2x019444 for ; Wed, 14 Dec 2011 18:27:20 -0700 Message-ID: <4EE94CF7.2070601@linux.vnet.ibm.com> Date: Wed, 14 Dec 2011 19:27:19 -0600 From: Michael Roth MIME-Version: 1.0 References: <20111213162850.4cd135a3@doriath> <4EE7AF7C.2060302@linux.vnet.ibm.com> <20111214110029.4549bf90@doriath> <4EE8C6B5.1010406@linux.vnet.ibm.com> <20111214143855.5652b0cb@doriath> <20111214161751.543e0b02@doriath> <4EE8FC5B.9040807@linux.vnet.ibm.com> <20111214180615.21e47786@doriath> <4EE90D81.1030506@linux.vnet.ibm.com> <4EE911C0.2000007@linux.vnet.ibm.com> <20111214215601.5820b735@doriath> In-Reply-To: <20111214215601.5820b735@doriath> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2] qemu-ga: Add the guest-suspend command List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: Amit Shah , aliguori@us.ibm.com, qemu-devel On 12/14/2011 05:56 PM, Luiz Capitulino wrote: > On Wed, 14 Dec 2011 15:14:40 -0600 > Michael Roth wrote: > >> On 12/14/2011 02:56 PM, Michael Roth wrote: >>> On 12/14/2011 02:06 PM, Luiz Capitulino wrote: >>>> On Wed, 14 Dec 2011 13:43:23 -0600 >>>> Michael Roth wrote: >>>> >>>>> On 12/14/2011 12:17 PM, Luiz Capitulino wrote: >>>>>> On Wed, 14 Dec 2011 14:38:55 -0200 >>>>>> Luiz Capitulino wrote: >>>>>> >>>>>>>>> I'm also wondering if we could use g_child_watch_add(), but it's >>>>>>>>> not clear >>>>>>>>> to me if it works with processes not created with g_spawn_*() >>>>>>>>> functions. >>>>>>>> >>>>>>>> GPid's map to something other than PIDs on Windows, so I think >>>>>>>> we'd have >>>>>>>> issues there. But our fork() approach probably wouldn't work at >>>>>>>> all on >>>>>>>> Windows except maybe under cygwin, so at some point we'd probably >>>>>>>> want >>>>>>>> to switch over to g_spawn for this kind of stuff anyway... >>>>>>>> >>>>>>>> So this might be a good point to switch over to using the glib >>>>>>>> functions. >>>>>>>> >>>>>>>> Would you mind trying to do the hibernate/zombie reaping stuff using >>>>>>>> g_spawn+g_child_watch_add()? It might end up being the easiest route. >>>>>>>> Otherwise I can take a look at it later today. >>>>>>> >>>>>>> Well, there are two problems with g_spawn wrt to the manual method of >>>>>>> writing to the sysfs file. The first one is that I'm not sure if >>>>>>> g_spawn() >>>>>>> reports the file not found error synchronously. The other problem >>>>>>> is that, >>>>>>> I'd have to fork() anyway to write to the sysfs file (unless we >>>>>>> decide that >>>>>>> it's ok to do this synchronously, which seems ok to me). >>>>>> >>>>>> The version below uses g_spawn_async(). The code is a bit simpler >>>>>> than previous >>>>>> versions, but there are two important details about it: >>>>>> >>>>>> 1. I'm letting g_spawn_async() reap the child automatically. I don't >>>>>> know >>>>>> how it does it though. I'd guess it uses g_child_watch_add(), worst >>>>>> case it ignores SIGCHLD (although I think this would be awful) >>>>> >>>>> Weird, doesn't seem to add a watcher or do anything with SIGCHLD. You're >>>>> not seeing an error that's causing it to reap it immediately after >>>>> the fork? >>>> >>>> No. I haven't tested it much though, but the basic use case seemed to >>>> work >>>> fine. >>>> >>>>>> >>>>>> 2. The manual method of writing to the sysfs is done synchronously. >>>>>> This >>>>>> means that the command response will only be sent when the guest >>>>>> resumes >>>>> >>>>> Want to avoid sync blocking calls as much as possible until timeouts are >>>>> sorted out, particularly in this case. Currently, with hibernate at >>>>> least, QEMU will exit (or is something being done to change that >>>>> behavior?), libvirt will restart the VM later via some other interface, >>>>> establish a connection to a new monitor and new agent channel, then gets >>>>> a spurious response from the agent for a command it issue to a different >>>>> QEMU process far in the past. So the common scenario would need nasty, >>>>> special handling in libvirt and they'd probably hate us for it. >>>> >>>> Problem is: even with the other approach there's no way to guarantee that >>>> the response will arrive before the guest suspends or shuts down, as it >>>> depends on which process runs first. >>> >>>> If we want to guarantee that the client will see a response before the >>>> guest suspends, then we need some form of synchronization. Like, the >>>> child >>>> should wait for the parent to send the response to the client before >>>> exec()ing (or writing to the sysfs file). >>>> >>> >>> There's no way to guarantee it though, even if we remove the window of >> >> Rather, there's no way to guarantee the client will get a response (or >> that, given a response, the async command will succeed). > > Well, it's the best we can do, ie. only allow the child to run when the > parent is done sending the response. For the error case, we don't report > errors in the child back to the client anyway, we do it via syslog. > >> But you're that we could make it so that time-out/lack of response at >> least implies that the command *won't* be carried out... >> >> That might be a nice guarantee, but your call on whether you want to do >> the signalling approach for hibernate. Otherwise, I think async will >> work well enough in the common case, and response timeout would imply >> neither success nor failure of the request, which I think is the norm >> for most things. > > Yes, I think the signaling approach is complex and I'm not sure we really > we have a problem here. Maybe I should do it async for now, if this turns > out to be a problem for libvirt we can try more complex approaches. > Agreed, it reduces corner-cases, but still requires similar error-handling in place, so it's a good candidate for treating as a future optimization. Although... >> >>> opportunity for a child to complete a shutdown/suspend/etc request >>> before we write out a response, there's still the potential scenario >>> where an admin kills off/restarts qemu-ga while it's processing a request. >>> >>> So relying on client-side timeouts to handle these corner cases is >>> acceptable, since being able to handle that is a requirement of a robust >>> client, especially since we're talking to a potentially malicious >>> guest/agent. We just need to do our best to make these situations corner >>> cases rather than common ones. Anthony rightly pointed out on IRC that punting the problem to a client-side timeout still requires a reset mechanism in the case of suspend/hibernate, since, although it won't happen every single time as it would with a synchronous call, we can still send a spurious response when the guest wakes up (if we slept before delivery). So the libvirt implementation will need to do a reset every every time they reconnect after waking up a suspended/hibernated guest. It should also be part of the general timeout logic, since there's no guarantee the timeout wasn't premature and a spurious response won't still arrive. I'll get the logic for doing qemu-ga resets added to the wiki... it's not pretty, and I was hoping to hide it away in the QMP integration since QEMU's JSON parser is already prepped to handle the reserved 0xFF flush token that's required...but it works at least. >>> >>>>> >>>>> So probably best to stick with your previous approach for now, and look >>>>> for a glib solution later. >>>>> >>>>>> >>>>>> If you think this approach is acceptable, I'll test it more, update >>>>>> its doc, >>>>>> etc and post it again. >>>>>> >>>>>> diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json >>>>>> index 5f8a18d..63f65a6 100644 >>>>>> --- a/qapi-schema-guest.json >>>>>> +++ b/qapi-schema-guest.json >>>>>> @@ -219,3 +219,20 @@ >>>>>> ## >>>>>> { 'command': 'guest-fsfreeze-thaw', >>>>>> 'returns': 'int' } >>>>>> + >>>>>> +## >>>>>> +# @guest-suspend >>>>>> +# >>>>>> +# Suspend guest execution by entering ACPI power state S3 or S4. >>>>>> +# >>>>>> +# @mode: 'hibernate' RAM content is saved in the disk and the guest is >>>>>> +# powered down (this corresponds to ACPI S4) >>>>>> +# 'sleep' execution is suspended but the RAM retains its contents >>>>>> +# (this corresponds to ACPI S3) >>>>>> +# >>>>>> +# Notes: This is an asynchronous request. There's no guarantee it will >>>>>> +# succeed. Errors will be logged to guest's syslog. >>>>>> +# >>>>>> +# Since: 1.1 >>>>>> +## >>>>>> +{ 'command': 'guest-suspend', 'data': { 'mode': 'str' } } >>>>>> diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c >>>>>> index a09c8ca..0c6b78e 100644 >>>>>> --- a/qga/guest-agent-commands.c >>>>>> +++ b/qga/guest-agent-commands.c >>>>>> @@ -574,6 +574,56 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err) >>>>>> } >>>>>> #endif >>>>>> >>>>>> +#define LINUX_SYS_STATE_FILE "/sys/power/state" >>>>>> + >>>>>> +void qmp_guest_suspend(const char *mode, Error **err) >>>>>> +{ >>>>>> + GError *error = NULL; >>>>>> + gchar *argv[2]; >>>>>> + >>>>>> + if (strcmp(mode, "hibernate") == 0) { >>>>>> + argv[0] = (gchar *) "pm-hibernate"; >>>>>> + } else if (strcmp(mode, "sleep") == 0) { >>>>>> + argv[0] = (gchar *) "pm-suspend"; >>>>>> + } else if (strcmp(mode, "hybrid") == 0) { >>>>>> + argv[0] = (gchar *) "pm-hybrid"; >>>>>> + } else { >>>>>> + error_set(err, QERR_INVALID_PARAMETER, "mode"); >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + argv[1] = NULL; >>>>>> + if (g_spawn_async(NULL, argv, NULL,G_SPAWN_SEARCH_PATH | >>>>>> + G_SPAWN_FILE_AND_ARGV_ZERO, >>>>>> + NULL, NULL, NULL,&error)< 0) { >>>>>> + int fd; >>>>>> + const char *cmd; >>>>>> + >>>>>> + slog("%s\n", error->message); >>>>>> + g_error_free(error); >>>>>> + >>>>>> + if (strcmp(mode, "hybrid") == 0) { >>>>>> + error_set(err, QERR_UNDEFINED_ERROR); >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + slog("trying to suspend using the manual method...\n"); >>>>>> + >>>>>> + fd = open(LINUX_SYS_STATE_FILE, O_WRONLY); >>>>>> + if (fd< 0) { >>>>>> + error_set(err, QERR_OPEN_FILE_FAILED, LINUX_SYS_STATE_FILE); >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + cmd = strcmp(mode, "sleep") == 0 ? "mem" : "disk"; >>>>>> + if (write(fd, cmd, strlen(cmd))< 0) { >>>>>> + error_set(err, QERR_IO_ERROR); >>>>>> + } >>>>>> + >>>>>> + close(fd); >>>>>> + } >>>>>> +} >>>>>> + >>>>>> /* register init/cleanup routines for stateful command groups */ >>>>>> void ga_command_state_init(GAState *s, GACommandState *cs) >>>>>> { >>>>> >>>> >>> >> >