Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Michael Roth <mdroth@linux.vnet.ibm.com>
To: Luiz Capitulino <lcapitulino@redhat.com>
Cc: amit.shah@redhat.com, jcody@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command
Date: Fri, 06 Jan 2012 15:03:24 -0600	[thread overview]
Message-ID: <4F07619C.60100@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120106170439.02292f6f@doriath>

On 01/06/2012 01:04 PM, Luiz Capitulino wrote:
> On Thu, 05 Jan 2012 15:41:33 -0600
> Michael Roth<mdroth@linux.vnet.ibm.com>  wrote:
>
>> On 01/05/2012 02:25 PM, Luiz Capitulino wrote:
>>> On Thu, 05 Jan 2012 09:10:50 -0600
>>> Michael Roth<mdroth@linux.vnet.ibm.com>   wrote:
>>>
>>>> On 01/05/2012 08:42 AM, Luiz Capitulino wrote:
>>>>> On Thu, 5 Jan 2012 12:59:27 +0000
>>>>> "Daniel P. Berrange"<berrange@redhat.com>    wrote:
>>>>>
>>>>>> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
>>>>>>> On Thu, 5 Jan 2012 10:16:30 +0000
>>>>>>> "Daniel P. Berrange"<berrange@redhat.com>    wrote:
>>>>>>>
>>>>>>>> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
>>>>>>>>> This version drops modes 'sleep' and 'hybrid' because they don't work
>>>>>>>>> properly due to issues in qemu. Only the 'hibernate' mode is supported
>>>>>>>>> for now.
>>>>>>>>
>>>>>>>> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
>>>>>>>> these modes work, you have needlessly put users in the situation where
>>>>>>>> they have to now upgrade the guest agent everywhere to take advantage
>>>>>>>> of the bugfix.
>>>>>>>
>>>>>>> That was my thinking until v4. But after discussing with Michael the issues
>>>>>>> we have with S3 I concluded that it doesn't make sense to offer an API to
>>>>>>> something that doesn't work, this will just generate bug reports. Also,
>>>>>>> updating to get new features is normal and expected.
>>>>>>
>>>>>> This is assuming that users will always upgrade their VMs&    hosts in
>>>>>> lock step, which I rather doubt they will in practice. eg imagine a
>>>>>> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
>>>>>> and QEMU 1.2 (working S3). If they build VM disk images they will likely
>>>>>> use the QEMU GA from 1.2 for all their builds, even if many of them
>>>>>> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
>>>>>> 'hybrid' commands available in the guest agent, even though the host
>>>>>> QEMU doesn't work properly.
>>>>>>
>>>>>> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
>>>>>> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
>>>>>> this during 1.1, you may well find yourself in an un-winnable situation
>>>>>> for 1.2, where it is impossible to provide good behaviour on old hosts.
>>>>>>
>>>>>> So IMHO we are better off in the long run, if we include all commands
>>>>>> right now, even though some don't work yet, and work to ensure we have
>>>>>> good error reporting behaviour for those that don't work.
>>>>>
>>>>> Yes, I agree. As a side note: if we add error reporting it will only work
>>>>> on 1.1 and later.  Ie, the problem you describe above will still happen
>>>>> with 1.0.
>>>>>
>>>>> But what you're suggesting seems to be the right thing to do. Do you agree
>>>>> Michael?
>>>>
>>>> Agree, but unless we add an RPC that QEMU uses to advertise
>>>> capabilities, I'm really not sure it's possible to detect whether or not
>>>> the host will support it.
>>>
>>> You mean an RPC to advertise if 'sleep' is supported? I think this is best done
>>> by making guest-suspend return an error as suggested by Daniel, otherwise a
>>> client that doesn't query for capabilities might run in trouble.
>>
>> Agreed, but what I mean is that if the user executes the suspend using
>> on up-level agent running on a down-level 1.0 host, the agent will still
>> see s3 advertised and issue the buggy suspend. That's why I suggested
>> the host->agent capabilities reporting as a possible (but somewhat ugly)
>> way to just simply tell the agent it can handle it (and, lacking that,
>> assume that it can't).
>
> That makes sense.
>
>>
>>>
>>> There's an important detail though: we need to make qemu not advertise S3 for
>>> this to work. However, we might be able to fix S3 for 1.1 (and bugs, like the
>>> S4 ones, can't be detected, limiting the scope of the 'unsupported' error).
>>>
>>> So, we could merge all modes and commit to get S3 fixed for 1.1 :)
>>
>> No disagreement there, if we can commit to making qemu-ga/qemu 1.1
>> releases interoperable in this manner (whether by fixing s3 or not
>> advertising it), I think that approach is perfectly fine, ideal even.
>> Doing a 1.1 release where qemu and qemu-ga are not interoperable (qemu
>> missing s3 support, qemu-ga using s3) was my main objection.
>
> I see.
>
>> But there is a 2nd topic here I'm trying to mull over: what is qemu-ga's
>> support policy for down-level hosts? backward-compatible? incompatible?
>
> That's a good question, I think we should be backward-compatible, but I think
> that's not going to be trivial.
>
>> The above approach to this problem suggests the latter (qemu-ga 1.1 has
>> RPCs that will knowingly break 1.0 qemu instances). We could solve this
>> by introducing the capabilities negotiation I mentioned early. It
>> actually wouldn't need to be anything other than qemu telling qemu-ga
>> what qemu-ga version-level it supports. By default we assume 1.0, and
>> limit qemu-ga to that until qemu-ga is told otherwise (so, no
>> sleep/hybrid suspend modes). For new RPCs we may be able to handle this
>> version automatically, since we include qemu version levels for the RPCs
>> in the schema. For functionality within an RPC (like sleep/hybrid
>> suspend modes) we could use conditional code.
>>
>> If we take that approach (maintaining backward-compatibility), we'd need
>> to introduce that code in the agent now, and require qemu/libvirt
>> execute the guest-set-support-level RPC or whatever to access these 1.1
>> features.
>
> What does guest-set-support-level do? It enables all 1.1 post features?

Well, that was my initial thought (we set host version level N, all 
RPCs/fields introduced after N are made unavailable). But if we added, 
say, a new optional parameter or RPC that wasn't dependent on a 
particular QEMU version, there's no reason to hide them from host 
programs higher up the stack (which may be aware of the new features, 
but are paired with older QEMU versions for whatever reason and so can't 
bump the support level above 1.0 without risking breakage for other stuff).

So, guest-set-support-level(N) enables all features that were marked as 
requiring QEMU version N. New features with no such dependencies 
(optional params, new RPCs) would be unguarded/enabled by default.

>
> A different approach would be to add a new field in the command dict in
> the schema file, say 'broken-in-qemu-version', and change qemu-ga to check
> that field in its main loop before executing a command. If
> 'broken-in-qemu-version'<= qemu version qemu-ga returns an not supported
> error.

Yah, still not sure what the best way to implement the check is. Though, 
I'd prefer the "positive" approach: 'requires[-at-least]-qemu-version'.

>
> For commands like the guest-suspend which is partially supported, we'd have
> to do a manual check for the qemu version as you suggested above.

Agreed, and just document qemu version dependencies in the schema. That 
may a reasonable approach for the above as well: if we introduce an RPC 
that requires a certain qemu version we just stick a version check at 
the beginning and bail if it fails. We could always get fancy with it 
later. Would make it easier to include this data in guest-info though... 
I look at it more and whip up a patch soon.

>
> That's just an idea though, I'm not sure what's the best way to do this.
>
>>
>> Technically, there's a required RPC qemu-ga clients need to execute
>> already: guest-sync. It's required because we have no way to reliably
>> detect EOF over virtio-serial, and thus an agent may send stale data to
>> a newly-connected qemu-ga client, so the client needs to do the
>> guest-sync command to find the expected response and re-sync the
>> streams. We could roll the guest-set-support-level functionality into
>> that. Basically just add another field.
>>
>>>
>>>> And if we can't detect that reliably, we're
>>>> better off leaving it out for now, because sleeping guests is not
>>>> obscure functionality, and accidentally nuking guests when a user sleeps
>>>> them (presumably because they want to retain their working state) is
>>>> much worse than telling a user to upgrade their agent, or not supported
>>>> or whatever.
>>>>
>>>>>
>>>>>> As an example, if S3 is broken in current QEMU, then we should not be
>>>>>> advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
>>>>>> to return false, at which point the guest agent can send back a nice error
>>>>>> message 'Suspend is not supported on this host', instead of just having the
>>>>>> guest try to suspend&    hang or worse.
>>>>>
>>>>
>>>
>>
>

next prev parent reply	other threads:[~2012-01-06 21:03 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-04 19:45 [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command Luiz Capitulino
2012-01-04 19:45 ` [Qemu-devel] [PATCH 1/2] qemu-ga: set O_NONBLOCK for serial channels Luiz Capitulino
2012-01-04 19:55   ` Michael Roth
2012-01-04 19:45 ` [Qemu-devel] [PATCH 2/2] qemu-ga: Add the guest-suspend command Luiz Capitulino
2012-01-04 20:00   ` Michael Roth
2012-01-04 20:03   ` Eric Blake
2012-01-05 12:29     ` Luiz Capitulino
2012-01-05 12:46   ` Daniel P. Berrange
2012-01-05 12:58     ` Luiz Capitulino
2012-01-05 10:16 ` [Qemu-devel] [PATCH v4 0/2]: " Daniel P. Berrange
2012-01-05 12:37   ` Luiz Capitulino
2012-01-05 12:59     ` Daniel P. Berrange
2012-01-05 14:42       ` Luiz Capitulino
2012-01-05 15:10         ` Michael Roth
2012-01-05 20:25           ` Luiz Capitulino
2012-01-05 21:41             ` Michael Roth
2012-01-06 19:04               ` Luiz Capitulino
2012-01-06 21:03                 ` Michael Roth [this message]
2012-01-05 15:04       ` Michael Roth
2012-01-05 15:11         ` Daniel P. Berrange
2012-01-05 15:18           ` Michael Roth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F07619C.60100@linux.vnet.ibm.com \
    --to=mdroth@linux.vnet.ibm.com \
    --cc=amit.shah@redhat.com \
    --cc=jcody@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).