Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
       [not found] ` <1400756850-19807-4-git-send-email-laine@laine.org>
@ 2014-05-22 19:33   ` Eric Blake
  2014-05-23  3:50     ` Marcelo Tosatti
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Blake @ 2014-05-22 19:33 UTC (permalink / raw)
  To: Laine Stump, libvir-list, mtosatti, qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 6856 bytes --]

[Adding qemu]

On 05/22/2014 05:07 AM, Laine Stump wrote:
> commit e31b5cf393857 attempted to fix libvirt's
> VIR_DOMAIN_EVENT_ID_RTC_CHANGE, which is documentated to always

s/documentated/documented/

> provide the new offset of the domain's real time clock from UTC. The
> problem was that, in the case that qemu is provided with an "-rtc
> base=x" where x is an absolute time (rather than "utc" or
> "localtime"), the offset sent by qemu's RTC_CHANGE event is *not* the
> new offset from UTC, but rather is the sum of all changes to the
> domain's RTC since it was started with base=x.
> 
> So, despite what was said in commit e31b5cf393857, if we assume that
> the original value stored in "adjustment" was the offset from UTC at
> the time the domain was started, we can always determine the current
> offset from UTC by simply adding the most recent (i.e. current) offset
> from qemu to that original adjustment.

Is this true even if we miss an RTC update event from qemu?  I'm worried
about the following situation:

user prepares to do a libvirtd upgrade, so libvirtd is shut down. Then
the guest triggers an RTC update, so qemu sends an event, but the event
is lost. Then libvirtd starts again, and doesn't realize the event is lost.

Do we need more help from qemu, such as a new field to an existing QMP
command (or a new QMP command) that lists the cumulative offset that
qemu is using, where we call that query command any time after an RTC
update event or after a libvirtd restart? I'm wondering if this is more
a bug in qemu for not providing the right information rather than
libvirt's responsibility to work around it.  If the only way to keep
accurate information is to sum the values we get from events, we are at
risk of a lost event getting us messed up.

> 
> This patch accomplishes that by storing the initial adjustment in the
> domain's status as "adjustment0". Each time a new RTC_CHANGE event is
> received from qemu, we simply add adjustment0 to the value sent by
> qemu, store that as the new adjustment, and forward that value on to
> any event handler.
> 
> This patch (*not* e31b5cf393857, which should be reverted prior to
> applying this patch) fixes:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=964177
> 
> (for the case where basis='utc'. It does not fix basis='localtime')
> ---
> 
> Changes from V1: remove all attempts to fix basis='localtime' in favor
> of fixing it in a simpler and better manner in a separate patch.

I'd also appreciate it if the qemu developers can chime in on what is
supposed to happen for localtime guests.

There are at least four combinations:

host running on UTC bios time, guest running on UTC time (in my opinion,
the only sane setting, but we're talking about reality not sanity)

host running on UTC, guest running on localtime (perhaps the guest is
windows, and we know that windows prefers to run on localtime)

host running on localtime bios (perhaps because it is dual-boot with
windows, and windows prefers bios in localtime), guest running on UTC time

host running on localtime, guest running on localtime

But it gets even more complicated.  The host localtime need not be
consistent with the guest localtime.  That is, I could be a cloud
provider with servers on the east coast, and renting out processor time
to a client on the west coast that wants their guest tied to west coast
localtime.  And that's assuming that both host and guest switch in and
out of daylight savings at the same time, which falls apart when you
cross political boundaries.  Then there's the fun of migration (what if
my server farm is spread across multiple timezones - does migration take
into account the difference in localtime between source and destination
servers).

I can _totally_ understand the desire to run a GUEST in such a way that
the guest thinks it has a bios stored in localtime (and when the guest
updates the RTC twice a year to account for daylight savings, it changes
what offset we track about the guest).  But I think it is INSANITY to
ever try and run a host on a localtime system (daylight savings changes
in the host are just asking for problems to the guests) - so even if the
host is tied to localtime bios, it is still probably wiser for qemu to
base its offsets to UTC no matter what.  If the commandline allows a
specification of a localtime offset, I think it should be used ONLY for
a one-time up-front conversion into a corresponding UTC offset, and then
execute qemu in relation to utc thereafter (therefore, migration is
always done in terms of utc, without regards for whether source and
destination have a different localtime).

> 
>  src/conf/domain_conf.c  | 21 +++++++++++++++++----
>  src/conf/domain_conf.h  |  7 +++++++
>  src/qemu/qemu_command.c |  9 +++++++++
>  src/qemu/qemu_process.c | 27 ++++++++++++++++++++++-----
>  4 files changed, 55 insertions(+), 9 deletions(-)
> 
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c

> -    if (vm->def->clock.offset == VIR_DOMAIN_CLOCK_OFFSET_VARIABLE)
> +    if (vm->def->clock.offset == VIR_DOMAIN_CLOCK_OFFSET_VARIABLE) {
> +        /* when a basedate is manually given on the qemu commandline
> +         * rather than simply "-rtc base=utc", the offset sent by qemu
> +         * in this event is *not* the new offset from UTC, but is
> +         * instead the new offset from the *original basedate* +
> +         * uptime. For example, if the original offset was 3600 and
> +         * the guest clock has been advanced by 10 seconds, qemu will
> +         * send "10" in the event - this means that the new offset
> +         * from UTC is 3610, *not* 10. If the guest clock is advanced
> +         * by another 10 seconds, qemu will now send "20" - i.e. each
> +         * event is the sum of the most recent change and all previous
> +         * changes since the domain was started. Fortunately, we have
> +         * saved the initial offset in "adjustment0", so to arrive at
> +         * the proper new "adjustment", we just add the most recent
> +         * offset to adjustment0.
> +         */
> +        offset += vm->def->clock.data.variable.adjustment0;
>          vm->def->clock.data.variable.adjustment = offset;
>  
> -    if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0)
> -        VIR_WARN("unable to save domain status with RTC change");
> +        if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0)
> +           VIR_WARN("unable to save domain status with RTC change");
> +    }
> +
> +    event = virDomainEventRTCChangeNewFromObj(vm, offset);
>  
>      virObjectUnlock(vm);
>  
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-22 19:33   ` [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/> Eric Blake
@ 2014-05-23  3:50     ` Marcelo Tosatti
  2014-05-23  9:17       ` Laine Stump
  2014-05-23 12:43       ` Luiz Capitulino
  0 siblings, 2 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2014-05-23  3:50 UTC (permalink / raw)
  To: Eric Blake
  Cc: libvir-list, Luiz Capitulino, qemu-devel@nongnu.org, Laine Stump

On Thu, May 22, 2014 at 01:33:14PM -0600, Eric Blake wrote:
> [Adding qemu]
> 
> On 05/22/2014 05:07 AM, Laine Stump wrote:
> > commit e31b5cf393857 attempted to fix libvirt's
> > VIR_DOMAIN_EVENT_ID_RTC_CHANGE, which is documentated to always
> 
> s/documentated/documented/
> 
> > provide the new offset of the domain's real time clock from UTC. The
> > problem was that, in the case that qemu is provided with an "-rtc
> > base=x" where x is an absolute time (rather than "utc" or
> > "localtime"), the offset sent by qemu's RTC_CHANGE event is *not* the
> > new offset from UTC, but rather is the sum of all changes to the
> > domain's RTC since it was started with base=x.
> > 
> > So, despite what was said in commit e31b5cf393857, if we assume that
> > the original value stored in "adjustment" was the offset from UTC at
> > the time the domain was started, we can always determine the current
> > offset from UTC by simply adding the most recent (i.e. current) offset
> > from qemu to that original adjustment.
> 
> Is this true even if we miss an RTC update event from qemu?  I'm worried
> about the following situation:
> 
> user prepares to do a libvirtd upgrade, so libvirtd is shut down. 

If adjustment0 field is updated from adjustment, via a libvirtd shutdown, the current
patch will also break, i believe. Not sure if thats possible, though.

> Then the guest triggers an RTC update, so qemu sends an event, but the
> event is lost. Then libvirtd starts again, and doesn't realize the
> event is lost.

Yes, but that case is also true for any other QMP asynchronous event,
and therefore should be handled generically i suppose (QMP channel data
should be maintained across libvirtd shutdown). Luiz?

> Do we need more help from qemu, such as a new field to an existing QMP
> command (or a new QMP command) that lists the cumulative offset that
> qemu is using, where we call that query command any time after an RTC
> update event or after a libvirtd restart? I'm wondering if this is more
> a bug in qemu for not providing the right information rather than
> libvirt's responsibility to work around it.  If the only way to keep
> accurate information is to sum the values we get from events, we are at
> risk of a lost event getting us messed up.

Good point, unsure whether its specific to this command, though. Could
add a new QMP command to query the RTC offset, yes.

> > This patch accomplishes that by storing the initial adjustment in the
> > domain's status as "adjustment0". Each time a new RTC_CHANGE event is
> > received from qemu, we simply add adjustment0 to the value sent by
> > qemu, store that as the new adjustment, and forward that value on to
> > any event handler.
> > 
> > This patch (*not* e31b5cf393857, which should be reverted prior to
> > applying this patch) fixes:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=964177
> > 
> > (for the case where basis='utc'. It does not fix basis='localtime')
> > ---
> > 
> > Changes from V1: remove all attempts to fix basis='localtime' in favor
> > of fixing it in a simpler and better manner in a separate patch.
> 
> I'd also appreciate it if the qemu developers can chime in on what is
> supposed to happen for localtime guests.
> 
> There are at least four combinations:
> 
> host running on UTC bios time, guest running on UTC time (in my opinion,
> the only sane setting, but we're talking about reality not sanity)

1) Why does Windows keep your BIOS clock on local time?
http://blogs.msdn.com/b/oldnewthing/archive/2004/09/02/224672.aspx

2) Windows with RTC UTC
https://wiki.archlinux.org/index.php/Time#UTC_in_Windows

> host running on UTC, guest running on localtime (perhaps the guest is
> windows, and we know that windows prefers to run on localtime)

Its the default, thats all. See 1) above.

> host running on localtime bios (perhaps because it is dual-boot with
> windows, and windows prefers bios in localtime), guest running on UTC time

Can't see why hosts using localtime or UTC is relevant. Assume host is
synchronized to UTC via NTP (so you can use UTC or convert to localtime
if desired).

> host running on localtime, guest running on localtime
> 
> But it gets even more complicated.  The host localtime need not be
> consistent with the guest localtime.  That is, I could be a cloud
> provider with servers on the east coast, and renting out processor time
> to a client on the west coast that wants their guest tied to west coast
> localtime.  And that's assuming that both host and guest switch in and
> out of daylight savings at the same time, which falls apart when you
> cross political boundaries.  Then there's the fun of migration (what if
> my server farm is spread across multiple timezones - does migration take
> into account the difference in localtime between source and destination
> servers).
> 
> I can _totally_ understand the desire to run a GUEST in such a way that
> the guest thinks it has a bios stored in localtime (and when the guest
> updates the RTC twice a year to account for daylight savings, it changes
> what offset we track about the guest).  But I think it is INSANITY to
> ever try and run a host on a localtime system (daylight savings changes
> in the host are just asking for problems to the guests) - so even if the
> host is tied to localtime bios, it is still probably wiser for qemu to
> base its offsets to UTC no matter what.  If the commandline allows a
> specification of a localtime offset, I think it should be used ONLY for
> a one-time up-front conversion into a corresponding UTC offset, and then
> execute qemu in relation to utc thereafter (therefore, migration is
> always done in terms of utc, without regards for whether source and
> destination have a different localtime).

Guest side:
* Emulated RTC CMOS clock is initialized to either UTC or localtime when the 
guest initializes.
* Guest reads RTC CMOS clock on boot, or if explicitly requested to
during runtime, and transfers that value to its system time.

* Migration maintains the emulated RTC CMOS clock value, but a
subsequent VM restart will use the destination hosts localtime,
in case -rtc base=localtime.

So using 

-rtc base= "localtime_r(time())" (this is a date) on VM creation

and then maintaining that base, along with guest RTC writes, across VM
restarts, seems much saner than ever using -rtc base=localtime on a cluster
with different timezones.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23  3:50     ` Marcelo Tosatti
@ 2014-05-23  9:17       ` Laine Stump
  2014-05-23 10:19         ` Laine Stump
  2014-05-23 12:43       ` Luiz Capitulino
  1 sibling, 1 reply; 16+ messages in thread
From: Laine Stump @ 2014-05-23  9:17 UTC (permalink / raw)
  To: Marcelo Tosatti, Eric Blake
  Cc: libvir-list, qemu-devel@nongnu.org, Luiz Capitulino

On 05/23/2014 06:50 AM, Marcelo Tosatti wrote:
> On Thu, May 22, 2014 at 01:33:14PM -0600, Eric Blake wrote:
>> [Adding qemu]
>>
>> On 05/22/2014 05:07 AM, Laine Stump wrote:
>>> commit e31b5cf393857 attempted to fix libvirt's
>>> VIR_DOMAIN_EVENT_ID_RTC_CHANGE, which is documentated to always
>> s/documentated/documented/
>>
>>> provide the new offset of the domain's real time clock from UTC. The
>>> problem was that, in the case that qemu is provided with an "-rtc
>>> base=x" where x is an absolute time (rather than "utc" or
>>> "localtime"), the offset sent by qemu's RTC_CHANGE event is *not* the
>>> new offset from UTC, but rather is the sum of all changes to the
>>> domain's RTC since it was started with base=x.
>>>
>>> So, despite what was said in commit e31b5cf393857, if we assume that
>>> the original value stored in "adjustment" was the offset from UTC at
>>> the time the domain was started, we can always determine the current
>>> offset from UTC by simply adding the most recent (i.e. current) offset
>>> from qemu to that original adjustment.
>> Is this true even if we miss an RTC update event from qemu?  I'm worried
>> about the following situation:
>>
>> user prepares to do a libvirtd upgrade, so libvirtd is shut down. 
> If adjustment0 field is updated from adjustment, via a libvirtd shutdown, the current
> patch will also break, i believe. Not sure if thats possible, though.

No, adjustment0 is only set at the time a new qemu process is started.

>
>> Then the guest triggers an RTC update, so qemu sends an event, but the
>> event is lost. Then libvirtd starts again, and doesn't realize the
>> event is lost.

That case would only be a problem until the *next* time an RTC update is
sent; at that time the adjustment would be readjusted to adjustment0 +
new offset (and that new offset is the cumulative sum of all adjustments
since the domain was started).

> Yes, but that case is also true for any other QMP asynchronous event,
> and therefore should be handled generically i suppose (QMP channel data
> should be maintained across libvirtd shutdown). Luiz?
>
>> Do we need more help from qemu, such as a new field to an existing QMP
>> command (or a new QMP command) that lists the cumulative offset that
>> qemu is using, where we call that query command any time after an RTC
>> update event or after a libvirtd restart? I'm wondering if this is more
>> a bug in qemu for not providing the right information rather than
>> libvirt's responsibility to work around it.  If the only way to keep
>> accurate information is to sum the values we get from events, we are at
>> risk of a lost event getting us messed up.
> Good point, unsure whether its specific to this command, though. Could
> add a new QMP command to query the RTC offset, yes.
>
>>> This patch accomplishes that by storing the initial adjustment in the
>>> domain's status as "adjustment0". Each time a new RTC_CHANGE event is
>>> received from qemu, we simply add adjustment0 to the value sent by
>>> qemu, store that as the new adjustment, and forward that value on to
>>> any event handler.
>>>
>>> This patch (*not* e31b5cf393857, which should be reverted prior to
>>> applying this patch) fixes:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=964177
>>>
>>> (for the case where basis='utc'. It does not fix basis='localtime')
>>> ---
>>>
>>> Changes from V1: remove all attempts to fix basis='localtime' in favor
>>> of fixing it in a simpler and better manner in a separate patch.
>> I'd also appreciate it if the qemu developers can chime in on what is
>> supposed to happen for localtime guests.
>>
>> There are at least four combinations:
>>
>> host running on UTC bios time, guest running on UTC time (in my opinion,
>> the only sane setting, but we're talking about reality not sanity)
> 1) Why does Windows keep your BIOS clock on local time?
> http://blogs.msdn.com/b/oldnewthing/archive/2004/09/02/224672.aspx
>
> 2) Windows with RTC UTC
> https://wiki.archlinux.org/index.php/Time#UTC_in_Windows
>
>> host running on UTC, guest running on localtime (perhaps the guest is
>> windows, and we know that windows prefers to run on localtime)
> Its the default, thats all. See 1) above.
>
>> host running on localtime bios (perhaps because it is dual-boot with
>> windows, and windows prefers bios in localtime), guest running on UTC time
> Can't see why hosts using localtime or UTC is relevant. Assume host is
> synchronized to UTC via NTP (so you can use UTC or convert to localtime
> if desired).

Correct - the host's use of its own RTC is irrelevant. The only
potential difference is basis='utc' vs. basis='localtime', and anyone
using basis='localtime' more or less deserves the confusion it causes
:-P (That really is a "tongue in cheek" emoticon - I'm only half serious).

*However*, this discussion forced me to investigate some of the basic
assumptions that I'd been making when coming in to fix this bug. In
particular, my assumption was that the value of "adjustment" that was
set in the status would be preserved across a domain save/restore
operation, or a migration, but after talking to jdenemar and looking at
the code, I believe that this is *not* the case. As I understand it, one
of the driving factors behind having adjustment='variable' was that
changes to the RTC would be properly maintained across save/resoter and
migrations, and thus I ASSumed that the setting *was* being maintained,
but that the math was wrong. As far as I can tell, though, only the
*inactive* XML is saved in a save image or sent in a migration, while
the modified adjustment is only in the transient/status XML; the result
is that the modified adjustment (and modified basis per patch 4/4) are
lost any time there is a save/restore or migration.

So while these patches are correcting the math, I guess any claims that
I make in the commit logs about them fixing problems with migration or
save/restore are ill-informed, and should be removed (and we should
file/track a separate bug about that issue).

>> host running on localtime, guest running on localtime
>>
>> But it gets even more complicated.  The host localtime need not be
>> consistent with the guest localtime.  That is, I could be a cloud
>> provider with servers on the east coast, and renting out processor time
>> to a client on the west coast that wants their guest tied to west coast
>> localtime.  And that's assuming that both host and guest switch in and
>> out of daylight savings at the same time, which falls apart when you
>> cross political boundaries.  Then there's the fun of migration (what if
>> my server farm is spread across multiple timezones - does migration take
>> into account the difference in localtime between source and destination
>> servers).

I'm pretty sure that Daniel's suggestion (which I've implemented in
patch 4) removes all of the problems *that don't involve migration or
save/restore* to the extent that they can be removed (and they *do* set
the adjustment/basis in the status such that if they *were* preserved
across migrate or save/restore, behavior would then be correct) although
there is still a problem if you don't know which timezone the initial
host will be in, but want your guest to have RTC set to the localtime of
a particular timezone - I suppose for that we would need to expand
"basis='utc|localtime'" to allow specifying a particular timezone, then
learn the time of that timezone when qemu is started. That's beyond the
scope of this "bugfix" patch series though.


>>
>> I can _totally_ understand the desire to run a GUEST in such a way that
>> the guest thinks it has a bios stored in localtime (and when the guest
>> updates the RTC twice a year to account for daylight savings, it changes
>> what offset we track about the guest).  But I think it is INSANITY to
>> ever try and run a host on a localtime system (daylight savings changes
>> in the host are just asking for problems to the guests) - so even if the
>> host is tied to localtime bios, it is still probably wiser for qemu to
>> base its offsets to UTC no matter what.  If the commandline allows a
>> specification of a localtime offset, I think it should be used ONLY for
>> a one-time up-front conversion into a corresponding UTC offset, and then
>> execute qemu in relation to utc thereafter (therefore, migration is
>> always done in terms of utc, without regards for whether source and
>> destination have a different localtime).

Yep :-)

> Guest side:
> * Emulated RTC CMOS clock is initialized to either UTC or localtime when the 
> guest initializes.
> * Guest reads RTC CMOS clock on boot, or if explicitly requested to
> during runtime, and transfers that value to its system time.
>
> * Migration maintains the emulated RTC CMOS clock value, but a
> subsequent VM restart will use the destination hosts localtime,
> in case -rtc base=localtime.
>
> So using 
>
> -rtc base= "localtime_r(time())" (this is a date) on VM creation
>
> and then maintaining that base, along with guest RTC writes, across VM
> restarts, seems much saner than ever using -rtc base=localtime on a cluster
> with different timezones.
>
>

Again, yep :-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23  9:17       ` Laine Stump
@ 2014-05-23 10:19         ` Laine Stump
  2014-05-23 14:54           ` Eric Blake
  0 siblings, 1 reply; 16+ messages in thread
From: Laine Stump @ 2014-05-23 10:19 UTC (permalink / raw)
  To: Marcelo Tosatti, Eric Blake; +Cc: libvir-list, qemu-devel@nongnu.org

On 05/23/2014 12:17 PM, Laine Stump wrote:
> *However*, this discussion forced me to investigate some of the basic
> assumptions that I'd been making when coming in to fix this bug. In
> particular, my assumption was that the value of "adjustment" that was
> set in the status would be preserved across a domain save/restore
> operation, or a migration, but after talking to jdenemar and looking
> at the code, I believe that this is *not* the case.

Okay, disregard this "sky is falling" outburst. I was misreading the
code and misinterpreting what jdenemar told me. The updated value of
adjustment and basis *are* properly preserved across save/restore and
migrate.

The problem Eric pointed out is real though (if the domain RTC is
modified while libvirtd isn't available to catch the RTC_CHANGE event,
libvirt will have an incorrect idea of adjustment. This will be
temporary until another RTC_CHANGE event happens *unless* the domain is
migrated or saved/restored before another RTC_CHANGE happens, in which
case the incorrectness will be semi-permanent (until the domain is
completely stopped and restarted).)

Also, as I understand it, this means that if a domain is migrated with
persistence, the new domain on the destination will have made the change
to adjustment and basis permanent, but if it stays on the same host that
change will only be valid until the domain is destroyed - next time it
is started it will go back to the original settings.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23  3:50     ` Marcelo Tosatti
  2014-05-23  9:17       ` Laine Stump
@ 2014-05-23 12:43       ` Luiz Capitulino
  2014-05-23 13:35         ` Markus Armbruster
  1 sibling, 1 reply; 16+ messages in thread
From: Luiz Capitulino @ 2014-05-23 12:43 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: libvir-list, qemu-devel@nongnu.org, Laine Stump

On Fri, 23 May 2014 00:50:38 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> > Then the guest triggers an RTC update, so qemu sends an event, but the
> > event is lost. Then libvirtd starts again, and doesn't realize the
> > event is lost.
> 
> Yes, but that case is also true for any other QMP asynchronous event,
> and therefore should be handled generically i suppose (QMP channel data
> should be maintained across libvirtd shutdown). Luiz?

Maintaining QMP channel data doesn't solve this problem, because all sorts
of race conditions are still possible. For example, libvirt could crash
after having received the event but before handling it.

The most reliable way we found to solve this problem, and that's what we
do for other events, is to allow libvirt to query the information the event
is reporting. An event is nothing more than a state change in QEMU, and QEMU
state is persistent during the life time of the VM, so we allow libvirt to
query the state of anything that may send an event.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 12:43       ` Luiz Capitulino
@ 2014-05-23 13:35         ` Markus Armbruster
  2014-05-23 13:48           ` Marcelo Tosatti
  2014-05-23 21:36           ` Paolo Bonzini
  0 siblings, 2 replies; 16+ messages in thread
From: Markus Armbruster @ 2014-05-23 13:35 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: libvir-list, Marcelo Tosatti, qemu-devel@nongnu.org, Laine Stump

Luiz Capitulino <lcapitulino@redhat.com> writes:

> On Fri, 23 May 2014 00:50:38 -0300
> Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
>> > Then the guest triggers an RTC update, so qemu sends an event, but the
>> > event is lost. Then libvirtd starts again, and doesn't realize the
>> > event is lost.
>> 
>> Yes, but that case is also true for any other QMP asynchronous event,
>> and therefore should be handled generically i suppose (QMP channel data
>> should be maintained across libvirtd shutdown). Luiz?
>
> Maintaining QMP channel data doesn't solve this problem, because all sorts
> of race conditions are still possible. For example, libvirt could crash
> after having received the event but before handling it.
>
> The most reliable way we found to solve this problem, and that's what we
> do for other events, is to allow libvirt to query the information the event
> is reporting. An event is nothing more than a state change in QEMU, and QEMU
> state is persistent during the life time of the VM, so we allow libvirt to
> query the state of anything that may send an event.

In fact, this is a general rule: when libvirt tracks an event, it also
needs a way to poll for the information in the event.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 13:35         ` Markus Armbruster
@ 2014-05-23 13:48           ` Marcelo Tosatti
  2014-05-23 13:54             ` Marcelo Tosatti
                               ` (3 more replies)
  2014-05-23 21:36           ` Paolo Bonzini
  1 sibling, 4 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2014-05-23 13:48 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: libvir-list, Laine Stump, qemu-devel@nongnu.org, Luiz Capitulino

On Fri, May 23, 2014 at 03:35:19PM +0200, Markus Armbruster wrote:
> Luiz Capitulino <lcapitulino@redhat.com> writes:
> 
> > On Fri, 23 May 2014 00:50:38 -0300
> > Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> >> > Then the guest triggers an RTC update, so qemu sends an event, but the
> >> > event is lost. Then libvirtd starts again, and doesn't realize the
> >> > event is lost.
> >> 
> >> Yes, but that case is also true for any other QMP asynchronous event,
> >> and therefore should be handled generically i suppose (QMP channel data
> >> should be maintained across libvirtd shutdown). Luiz?
> >
> > Maintaining QMP channel data doesn't solve this problem, because all sorts
> > of race conditions are still possible. For example, libvirt could crash
> > after having received the event but before handling it.
> >
> > The most reliable way we found to solve this problem, and that's what we
> > do for other events, is to allow libvirt to query the information the event
> > is reporting. An event is nothing more than a state change in QEMU, and QEMU
> > state is persistent during the life time of the VM, so we allow libvirt to
> > query the state of anything that may send an event.
> 
> In fact, this is a general rule: when libvirt tracks an event, it also
> needs a way to poll for the information in the event.

I see.

This also seems pretty harmful wrt losing events:

/* Global, one-time initializer to configure the rate limiting
 * and initialize state */
static void monitor_protocol_event_init(void)
{
    /* Limit RTC & BALLOON events to 1 per second */
    monitor_protocol_event_throttle(QEVENT_RTC_CHANGE, 1000);

Better remove it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 13:48           ` Marcelo Tosatti
@ 2014-05-23 13:54             ` Marcelo Tosatti
  2014-05-23 13:54             ` Luiz Capitulino
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2014-05-23 13:54 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: libvir-list, Laine Stump, qemu-devel@nongnu.org, Luiz Capitulino

On Fri, May 23, 2014 at 10:48:18AM -0300, Marcelo Tosatti wrote:
> On Fri, May 23, 2014 at 03:35:19PM +0200, Markus Armbruster wrote:
> > Luiz Capitulino <lcapitulino@redhat.com> writes:
> > 
> > > On Fri, 23 May 2014 00:50:38 -0300
> > > Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > >> > Then the guest triggers an RTC update, so qemu sends an event, but the
> > >> > event is lost. Then libvirtd starts again, and doesn't realize the
> > >> > event is lost.
> > >> 
> > >> Yes, but that case is also true for any other QMP asynchronous event,
> > >> and therefore should be handled generically i suppose (QMP channel data
> > >> should be maintained across libvirtd shutdown). Luiz?
> > >
> > > Maintaining QMP channel data doesn't solve this problem, because all sorts
> > > of race conditions are still possible. For example, libvirt could crash
> > > after having received the event but before handling it.
> > >
> > > The most reliable way we found to solve this problem, and that's what we
> > > do for other events, is to allow libvirt to query the information the event
> > > is reporting. An event is nothing more than a state change in QEMU, and QEMU
> > > state is persistent during the life time of the VM, so we allow libvirt to
> > > query the state of anything that may send an event.
> > 
> > In fact, this is a general rule: when libvirt tracks an event, it also
> > needs a way to poll for the information in the event.
> 
> I see.
> 
> This also seems pretty harmful wrt losing events:
> 
> /* Global, one-time initializer to configure the rate limiting
>  * and initialize state */
> static void monitor_protocol_event_init(void)
> {
>     /* Limit RTC & BALLOON events to 1 per second */
>     monitor_protocol_event_throttle(QEVENT_RTC_CHANGE, 1000);
> 
> Better remove it.

Well, libvirt should disable throttling for events it cares about 
notifications not being lost. Is it doing so?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 13:48           ` Marcelo Tosatti
  2014-05-23 13:54             ` Marcelo Tosatti
@ 2014-05-23 13:54             ` Luiz Capitulino
  2014-05-23 13:56             ` Daniel P. Berrange
  2014-05-23 14:04             ` Eric Blake
  3 siblings, 0 replies; 16+ messages in thread
From: Luiz Capitulino @ 2014-05-23 13:54 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: libvir-list, Laine Stump, Markus Armbruster,
	qemu-devel@nongnu.org

On Fri, 23 May 2014 10:48:18 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> On Fri, May 23, 2014 at 03:35:19PM +0200, Markus Armbruster wrote:
> > Luiz Capitulino <lcapitulino@redhat.com> writes:
> > 
> > > On Fri, 23 May 2014 00:50:38 -0300
> > > Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > >> > Then the guest triggers an RTC update, so qemu sends an event, but the
> > >> > event is lost. Then libvirtd starts again, and doesn't realize the
> > >> > event is lost.
> > >> 
> > >> Yes, but that case is also true for any other QMP asynchronous event,
> > >> and therefore should be handled generically i suppose (QMP channel data
> > >> should be maintained across libvirtd shutdown). Luiz?
> > >
> > > Maintaining QMP channel data doesn't solve this problem, because all sorts
> > > of race conditions are still possible. For example, libvirt could crash
> > > after having received the event but before handling it.
> > >
> > > The most reliable way we found to solve this problem, and that's what we
> > > do for other events, is to allow libvirt to query the information the event
> > > is reporting. An event is nothing more than a state change in QEMU, and QEMU
> > > state is persistent during the life time of the VM, so we allow libvirt to
> > > query the state of anything that may send an event.
> > 
> > In fact, this is a general rule: when libvirt tracks an event, it also
> > needs a way to poll for the information in the event.
> 
> I see.
> 
> This also seems pretty harmful wrt losing events:
> 
> /* Global, one-time initializer to configure the rate limiting
>  * and initialize state */
> static void monitor_protocol_event_init(void)
> {
>     /* Limit RTC & BALLOON events to 1 per second */
>     monitor_protocol_event_throttle(QEVENT_RTC_CHANGE, 1000);
> 
> Better remove it.

You mean, this is causing problems to the RTC_CHANGE event or you
found a general problem? Can you give more details?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 13:48           ` Marcelo Tosatti
  2014-05-23 13:54             ` Marcelo Tosatti
  2014-05-23 13:54             ` Luiz Capitulino
@ 2014-05-23 13:56             ` Daniel P. Berrange
  2014-05-23 14:04             ` Eric Blake
  3 siblings, 0 replies; 16+ messages in thread
From: Daniel P. Berrange @ 2014-05-23 13:56 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: libvir-list, qemu-devel@nongnu.org, Luiz Capitulino,
	Markus Armbruster, Laine Stump

On Fri, May 23, 2014 at 10:48:18AM -0300, Marcelo Tosatti wrote:
> On Fri, May 23, 2014 at 03:35:19PM +0200, Markus Armbruster wrote:
> > Luiz Capitulino <lcapitulino@redhat.com> writes:
> > 
> > > On Fri, 23 May 2014 00:50:38 -0300
> > > Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > >> > Then the guest triggers an RTC update, so qemu sends an event, but the
> > >> > event is lost. Then libvirtd starts again, and doesn't realize the
> > >> > event is lost.
> > >> 
> > >> Yes, but that case is also true for any other QMP asynchronous event,
> > >> and therefore should be handled generically i suppose (QMP channel data
> > >> should be maintained across libvirtd shutdown). Luiz?
> > >
> > > Maintaining QMP channel data doesn't solve this problem, because all sorts
> > > of race conditions are still possible. For example, libvirt could crash
> > > after having received the event but before handling it.
> > >
> > > The most reliable way we found to solve this problem, and that's what we
> > > do for other events, is to allow libvirt to query the information the event
> > > is reporting. An event is nothing more than a state change in QEMU, and QEMU
> > > state is persistent during the life time of the VM, so we allow libvirt to
> > > query the state of anything that may send an event.
> > 
> > In fact, this is a general rule: when libvirt tracks an event, it also
> > needs a way to poll for the information in the event.
> 
> I see.
> 
> This also seems pretty harmful wrt losing events:
> 
> /* Global, one-time initializer to configure the rate limiting
>  * and initialize state */
> static void monitor_protocol_event_init(void)
> {
>     /* Limit RTC & BALLOON events to 1 per second */
>     monitor_protocol_event_throttle(QEVENT_RTC_CHANGE, 1000);
> 
> Better remove it.

That is intentionally designed such that it doesn't cause any real
problems actually - the monitor rate limiting code will only drop
intermediate events - it is guaranteed you'll get the most recent
event after the rate limiting period elapse. eg if the guest OS
emits 6 events in the space on 1 second:

   RTC_CHANGE 353
   RTC_CHANGE 1338
   RTC_CHANGE 3542
   RTC_CHANGE 255
   RTC_CHANGE 522
   RTC_CHANGE 320

then, the monitor rate limiting may discard all except the very
last event. ie libvirt will see RTC_CHANGE == 320. The fact that
it didn't see the previous events is no problem, because they're
obsoleted by the new event.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 13:48           ` Marcelo Tosatti
                               ` (2 preceding siblings ...)
  2014-05-23 13:56             ` Daniel P. Berrange
@ 2014-05-23 14:04             ` Eric Blake
  3 siblings, 0 replies; 16+ messages in thread
From: Eric Blake @ 2014-05-23 14:04 UTC (permalink / raw)
  To: Marcelo Tosatti, Markus Armbruster
  Cc: libvir-list, qemu-devel@nongnu.org, Laine Stump

[-- Attachment #1: Type: text/plain, Size: 1233 bytes --]

On 05/23/2014 07:48 AM, Marcelo Tosatti wrote:

> This also seems pretty harmful wrt losing events:
> 
> /* Global, one-time initializer to configure the rate limiting
>  * and initialize state */
> static void monitor_protocol_event_init(void)
> {
>     /* Limit RTC & BALLOON events to 1 per second */
>     monitor_protocol_event_throttle(QEVENT_RTC_CHANGE, 1000);
> 
> Better remove it.

No, this throttling MUST be present to avoid a guest-triggered
denial-of-service attack (otherwise the guest could trigger RTC change
events fast enough to starve the host in dealing with them).  Again,
lost events are expected.  What is important is that when throttling,
and event is guaranteed to be sent at the end of the throttling period
so that libvirt will always eventually get an event with the latest
state (even if intermediate events were lost) with no more latency than
the throttling period; and either the event caries the current state
(and not something that needs accumulation), or the event is a witness
that libvirt needs to do an additional query- command to learn the
current state.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 10:19         ` Laine Stump
@ 2014-05-23 14:54           ` Eric Blake
  2014-05-23 16:42             ` Marcelo Tosatti
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Eric Blake @ 2014-05-23 14:54 UTC (permalink / raw)
  To: Laine Stump, Marcelo Tosatti; +Cc: libvir-list, qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 7498 bytes --]

On 05/23/2014 04:19 AM, Laine Stump wrote:
> On 05/23/2014 12:17 PM, Laine Stump wrote:
>> *However*, this discussion forced me to investigate some of the basic
>> assumptions that I'd been making when coming in to fix this bug. In
>> particular, my assumption was that the value of "adjustment" that was
>> set in the status would be preserved across a domain save/restore
>> operation, or a migration, but after talking to jdenemar and looking
>> at the code, I believe that this is *not* the case.
> 
> Okay, disregard this "sky is falling" outburst. I was misreading the
> code and misinterpreting what jdenemar told me. The updated value of
> adjustment and basis *are* properly preserved across save/restore and
> migrate.

Okay, now to rephrase things to see if I understand correctly, or maybe
add more confusion to the discussion.

If I understand it, the qemu event is always outputting the 'current
adjustment to the command-line offset', and not 'the delta applied to
the most recent RTC change'; while libvirt is trying to report 'the
current delta that the guest is using in relation to UTC'.  If the
command line was specified with UTC as the basis (that is, 0
command-line offset), then the qemu offset happens to also be the
libvirt desired offset from UTC.  If the command line was specified with
any other offset, then the offset from UTC is always the sum of the
command line initial offset + the current offset reported by the event,
and libvirt is not altering the initial offset while the guest runs.
And I don't think libvirt has a way for management to query the current
offset (libvirt was only tracking the current offset of running domains
in its private xml).

For a guest that uses localtime bios, you would start the guest
initially with a non-zero command-line offset (representing the offset
of the timezone the guest thinks it is running in).  Twice a year, the
guest will change its RTC by an hour, and the GOAL is that if we stop
the guest and then restart it later, the restarted guest will resume
with bios at the same offset from UTC as what it last had (that is, the
freshly-booted guest will behave as if bios has silently advanced by the
amount of time that the guest was not running, similar to how bare-metal
hardware has a battery powering the RTC even when the box is completely
off and unplugged).  If, during the downtime, the guest has crossed a
daylight savings boundary, we don't care - the guest OS upon boot will
recognize that it is using localtime bios, and that it missed a dst
boundary while the machine was powered off, and so it will presumably do
an RTC update by an hour fairly early in its boot process - which will
be caught as an RTC update event so we once again know the new offset to
be applied the next time we boot the guest.  For this to work, a
persistent guest definition needs to record the offset that was in use
at the time the guest powered off, and the next time qemu is started,
pass _that_ offset as the command-line offset against UTC.  In libvirt's
case, we intentionally run qemu to stay alive even after the guest
quits, and send an event that the guest is no longer running; this event
could be our trigger to read the qemu offset and adjust the persistent
state to track the new offset to use on the next boot of the guest.

More interesting is migration (whether by saving to a file or by
migration between hosts).  Libvirt is able to query the current qemu
offset prior to starting the migration, but what happens if the guest
changes the RTC on the source after the time that libvirt did the query
but before the guest is paused in preparation for the destination to
start running the guest?  Is the RTC offset transmitted as part of the
migration stream, even if it is a different offset at the time the
migration finally converges than what it was when the migration started?
 Furthermore, what command line offset should libvirt tell the
destination machine to use, since qemu is only tracking the RTC offset
relative to the command line, and not the offset relative to UTC?  Is
there a race here?

More concretely, suppose I start a guest with a command-line offset of 3
hours.  Then the guest does an RTC change to 4 hours (an offset of one
hour from the command line).  Then I want to do a migration - do I tell
the destination qemu to start with a 3 hour offset (identical to what
the source had) and the migration stream corrects it so that the
destination picks up with RTC already at 4 hours (and the guest sees no
discontinuity)? Or does libvirt have to query the current offset at the
time it gets ready to start the migration, and start the destination
with a command-line offset of 4 hours, because the offset is not
migrated?  If the latter, what happens if the guest changes RTC in
between when libvirt queries the source for the value to give to the
destination, vs. actually starting the migration?  It sounds like the
only sane way for migration to work is for the RTC offset to be part of
the stream, and that the destination must be started with the SAME
command line as the source; and therefore, the only time the command
line should be altered is when the guest is shut down (not for live
migration) - where libvirt must update the domain XML to track the
offset in use at the time the guest shuts down.

Meanwhile, I know we also recently added the qemu-guest-agent command to
set time, with two modes: 1. tell the guest what UTC time it should be
using, and update RTC to match (which will trigger an RTC change event;
and perhaps can be used to snoop whether the guest is setting RTC to a
localtime rather than UTC time); 2. tell the guest to re-read the RTC
and adjust its local time to match (which pre-supposes that RTC is
accurate).  The idea is that after management does something where the
guest has a long downtime (either migration to file, or a long suspend),
the guest's internal notion of time did not advance (because the CPU was
not running) while the RTC did advance (because qemu is still exposing
RTC relative to the host's UTC), so the management will tell the guest
to resync time as part of resuming.

Now, what good does the RTC change event do, if libvirt really only
needs to query the current RTC offset at the time shuts down?  Given
that the guest-agent command deals in UTC (when setting time), does the
management app need to know when the guest changes RTC due to daylight
savings?  Or is the case where management tells guest to reset its own
time by re-reading RTC matter for management to know whether that RTC is
not tracking UTC?

Also, it seems like most people want their guests to run with a proper
notion of current time, even over gaps of time where the guest is not
running (true if the guest is connected to a network and expects to do
anything where being in sync with other computers is important).  But is
there ever someone that wants a guest to run as though it were seeing
all possible time values, and thus where every time the guest is paused
and then resumed, its offset from UTC is increased by the amount of
downtime that elapsed (most likely, for a standalone guest with no
emulated network connections)?  Do we need to do anything different for
a management app trying to run a guest in such a mode?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 14:54           ` Eric Blake
@ 2014-05-23 16:42             ` Marcelo Tosatti
  2014-05-23 17:56             ` Eric Blake
  2014-05-23 18:31             ` Laine Stump
  2 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2014-05-23 16:42 UTC (permalink / raw)
  To: Eric Blake; +Cc: libvir-list, qemu-devel@nongnu.org, Laine Stump

On Fri, May 23, 2014 at 08:54:54AM -0600, Eric Blake wrote:
> On 05/23/2014 04:19 AM, Laine Stump wrote:
> > On 05/23/2014 12:17 PM, Laine Stump wrote:
> >> *However*, this discussion forced me to investigate some of the basic
> >> assumptions that I'd been making when coming in to fix this bug. In
> >> particular, my assumption was that the value of "adjustment" that was
> >> set in the status would be preserved across a domain save/restore
> >> operation, or a migration, but after talking to jdenemar and looking
> >> at the code, I believe that this is *not* the case.
> > 
> > Okay, disregard this "sky is falling" outburst. I was misreading the
> > code and misinterpreting what jdenemar told me. The updated value of
> > adjustment and basis *are* properly preserved across save/restore and
> > migrate.
> 
> Okay, now to rephrase things to see if I understand correctly, or maybe
> add more confusion to the discussion.
> 
> If I understand it, the qemu event is always outputting the 'current
> adjustment to the command-line offset', and not 'the delta applied to
> the most recent RTC change'; while libvirt is trying to report 'the
> current delta that the guest is using in relation to UTC'.  If the
> command line was specified with UTC as the basis (that is, 0
> command-line offset), then the qemu offset happens to also be the
> libvirt desired offset from UTC.  If the command line was specified with
> any other offset, then the offset from UTC is always the sum of the
> command line initial offset + the current offset reported by the event,
> and libvirt is not altering the initial offset while the guest runs.

It reports the offset to a running clock. That running clock is either
UTC or UTC+basedate, where basedate has been specified in -rtc
base=date.

> And I don't think libvirt has a way for management to query the current
> offset (libvirt was only tracking the current offset of running domains
> in its private xml).

No, have to fix, good point.

> For a guest that uses localtime bios, you would start the guest
> initially with a non-zero command-line offset (representing the offset
> of the timezone the guest thinks it is running in).  Twice a year, the
> guest will change its RTC by an hour, and the GOAL is that if we stop
> the guest and then restart it later, the restarted guest will resume
> with bios at the same offset from UTC as what it last had (that is, the
> freshly-booted guest will behave as if bios has silently advanced by the
> amount of time that the guest was not running, similar to how bare-metal
> hardware has a battery powering the RTC even when the box is completely
> off and unplugged).  If, during the downtime, the guest has crossed a
> daylight savings boundary, we don't care - the guest OS upon boot will
> recognize that it is using localtime bios, and that it missed a dst
> boundary while the machine was powered off, and so it will presumably do
> an RTC update by an hour fairly early in its boot process - which will
> be caught as an RTC update event so we once again know the new offset to
> be applied the next time we boot the guest.

Should start the RTC in VM creation with a given localtime, because Windows
expects RTC time in localtime format. After that, maintain whatever
offset the guest wrote to RTC afterwards, and emulate RTC CMOS
semantics by advancing time when guest is powered off.

> For this to work, a
> persistent guest definition needs to record the offset that was in use
> at the time the guest powered off, and the next time qemu is started,
> pass _that_ offset as the command-line offset against UTC.  In libvirt's
> case, we intentionally run qemu to stay alive even after the guest
> quits, and send an event that the guest is no longer running; this event
> could be our trigger to read the qemu offset and adjust the persistent
> state to track the new offset to use on the next boot of the guest.

Only have to query offset if libvirtd is shutdown, otherwise value from RTC_EVENT 
notification is sufficient.

> More interesting is migration (whether by saving to a file or by
> migration between hosts).  Libvirt is able to query the current qemu
> offset prior to starting the migration, but what happens if the guest
> changes the RTC on the source after the time that libvirt did the query
> but before the guest is paused in preparation for the destination to
> start running the guest?  Is the RTC offset transmitted as part of the
> migration stream, 

Yes.

> even if it is a different offset at the time the migration finally
> converges than what it was when the migration started? Furthermore,
> what command line offset should libvirt tell the destination machine
> to use, 

Always -rtc base="offset of guest RTC to UTC".

> since qemu is only tracking the RTC offset relative to the
> command line, and not the offset relative to UTC? Is there a race
> here?

But the command line date should be "UTC + offset". So if you track
against that, you can track against UTC.

> More concretely, suppose I start a guest with a command-line offset of 3
> hours.  Then the guest does an RTC change to 4 hours (an offset of one
> hour from the command line).  Then I want to do a migration - do I tell
> the destination qemu to start with a 3 hour offset (identical to what
> the source had) and the migration stream corrects it so that the
> destination picks up with RTC already at 4 hours (and the guest sees no
> discontinuity)? Or does libvirt have to query the current offset at the
> time it gets ready to start the migration, and start the destination
> with a command-line offset of 4 hours, because the offset is not
> migrated?  If the latter, what happens if the guest changes RTC in
> between when libvirt queries the source for the value to give to the
> destination, vs. actually starting the migration?  It sounds like the
> only sane way for migration to work is for the RTC offset to be part of
> the stream, and that the destination must be started with the SAME
> command line as the source; and therefore, the only time the command
> line should be altered is when the guest is shut down (not for live
> migration) - where libvirt must update the domain XML to track the
> offset in use at the time the guest shuts down.

Libvirt has to query the RTC offset only on libvirt reinitialization.
Otherwise, use value from RTC_EVENT.

> Meanwhile, I know we also recently added the qemu-guest-agent command to
> set time, with two modes: 1. tell the guest what UTC time it should be
> using, and update RTC to match (which will trigger an RTC change event;
> and perhaps can be used to snoop whether the guest is setting RTC to a
> localtime rather than UTC time); 2. tell the guest to re-read the RTC
> and adjust its local time to match (which pre-supposes that RTC is
> accurate).  The idea is that after management does something where the
> guest has a long downtime (either migration to file, or a long suspend),
> the guest's internal notion of time did not advance (because the CPU was
> not running) while the RTC did advance (because qemu is still exposing
> RTC relative to the host's UTC), so the management will tell the guest
> to resync time as part of resuming.

Yes.

> 
> Now, what good does the RTC change event do, if libvirt really only
> needs to query the current RTC offset at the time shuts down?  Given
> that the guest-agent command deals in UTC (when setting time), does the
> management app need to know when the guest changes RTC due to daylight
> savings?  Or is the case where management tells guest to reset its own
> time by re-reading RTC matter for management to know whether that RTC is
> not tracking UTC?

Management does not need to know anything about guest RTC. What it needs
to do is:

1) Initialize it to a given timezone's localtime value instead of UTC value, because
Windows assumes localtime value by default, on VM creation. Or UTC, if
desired (user choice).

2) Maintain whatever value the guest writes into its RTC.

3) When the guest is off, the RTC counts (just as real RTC).

> Also, it seems like most people want their guests to run with a proper
> notion of current time, even over gaps of time where the guest is not
> running (true if the guest is connected to a network and expects to do
> anything where being in sync with other computers is important).  But is
> there ever someone that wants a guest to run as though it were seeing
> all possible time values, and thus where every time the guest is paused
> and then resumed, its offset from UTC is increased by the amount of
> downtime that elapsed (most likely, for a standalone guest with no
> emulated network connections)?  Do we need to do anything different for
> a management app trying to run a guest in such a mode?

Guest RTC clock should count when guest is paused, or shutdown. That
covers all cases? (because it mimicks real RTC).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 14:54           ` Eric Blake
  2014-05-23 16:42             ` Marcelo Tosatti
@ 2014-05-23 17:56             ` Eric Blake
  2014-05-23 18:31             ` Laine Stump
  2 siblings, 0 replies; 16+ messages in thread
From: Eric Blake @ 2014-05-23 17:56 UTC (permalink / raw)
  To: Laine Stump, Marcelo Tosatti; +Cc: libvir-list, qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 2614 bytes --]

On 05/23/2014 08:54 AM, Eric Blake wrote:

>   For this to work, a
> persistent guest definition needs to record the offset that was in use
> at the time the guest powered off, and the next time qemu is started,
> pass _that_ offset as the command-line offset against UTC.  In libvirt's
> case, we intentionally run qemu to stay alive even after the guest
> quits, and send an event that the guest is no longer running; this event
> could be our trigger to read the qemu offset and adjust the persistent
> state to track the new offset to use on the next boot of the guest.

Hmm, I wonder if we have a libvirt design dilemma on our hands.
Remember, ever since qemu exposed the -no-shutdown flag, we have been
using it in libvirt.  It says that even though the guest has quit
executing, the qemu process must stick around until we explicitly let go
of it.  This is because the guest can shut down during a libvirtd
restart, so the new libvirtd can still connect to the qemu process,
learn that the guest has shut down, and take appropriate actions (query
qemu for other details like the final state of the RTC offset, tidy up
any transient XML, generate the libvirt event to management app, etc).
As long as the libvirt guest is persistent, the management app can still
query about the guest even after it shuts down.

But VDSM uses only transient guests.  Similar to how libvirt uses
-no-shutdown to keep qemu hanging around until libvirt acknowledges that
the guest is done, what recourse does VDSM have to keep a transient
guest's virDomainPtr around (even across libvirtd restarts) until VDSM
has had time to query the RTC offset from libvirt?  That is, since the
only libvirt use of RTC offset is to adjust the persistent definition
for the next boot, but VDSM is using only transient guests, VDSM needs a
way to track the RTC offset to tell libvirt to use the next time VDSM
creates a new transient guest for the same disk image.  Or put another
way, the moment that libvirt offloads persistence to a higher level
application, that higher level application needs a way to grab all
information that libvirt would have grabbed for a persistent guest,
which means a transient guest needs to exist in the shutoff state for at
least as long as VDSM hasn't closed it.  Do we have a design flaw in
that currently libvirt gets rid of transient guests as soon as they
shutdown, rather than waiting for management to close their last
reference to the shutdown domain?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 14:54           ` Eric Blake
  2014-05-23 16:42             ` Marcelo Tosatti
  2014-05-23 17:56             ` Eric Blake
@ 2014-05-23 18:31             ` Laine Stump
  2 siblings, 0 replies; 16+ messages in thread
From: Laine Stump @ 2014-05-23 18:31 UTC (permalink / raw)
  To: libvir-list; +Cc: Marcelo Tosatti, qemu-devel@nongnu.org

On 05/23/2014 05:54 PM, Eric Blake wrote:
> On 05/23/2014 04:19 AM, Laine Stump wrote:
>> On 05/23/2014 12:17 PM, Laine Stump wrote:
>>> *However*, this discussion forced me to investigate some of the basic
>>> assumptions that I'd been making when coming in to fix this bug. In
>>> particular, my assumption was that the value of "adjustment" that was
>>> set in the status would be preserved across a domain save/restore
>>> operation, or a migration, but after talking to jdenemar and looking
>>> at the code, I believe that this is *not* the case.
>> Okay, disregard this "sky is falling" outburst. I was misreading the
>> code and misinterpreting what jdenemar told me. The updated value of
>> adjustment and basis *are* properly preserved across save/restore and
>> migrate.
> Okay, now to rephrase things to see if I understand correctly, or maybe
> add more confusion to the discussion.
>
> If I understand it, the qemu event is always outputting the 'current
> adjustment to the command-line offset', and not 'the delta applied to
> the most recent RTC change';

Correct. It reports the new offset of the RTC relative to a hypothetical
clock that was started when the domain started, and hasn't been adjusted
at all.

>  while libvirt is trying to report 'the
> current delta that the guest is using in relation to UTC'.  If the
> command line was specified with UTC as the basis (that is, 0
> command-line offset), then the qemu offset happens to also be the
> libvirt desired offset from UTC. 

Well, if it was "<clock offset='variable' basis='utc' adjustment='0'/>",
then that is true.


>  If the command line was specified with
> any other offset, then the offset from UTC is always the sum of the
> command line initial offset + the current offset reported by the event,

Correct.

> and libvirt is not altering the initial offset while the guest runs.

Altering where? libvirt does change "adjustment" in the domain status to
reflect the most recent RTC_CHANGE event. It does this by preserving the
initial adjustment, and adding that to the offset in the RTC_CHANGE
event; this is set as the new adjustment.

> And I don't think libvirt has a way for management to query the current
> offset (libvirt was only tracking the current offset of running domains
> in its private xml).
>
> For a guest that uses localtime bios, you would start the guest
> initially with a non-zero command-line offset (representing the offset
> of the timezone the guest thinks it is running in).  Twice a year, the
> guest will change its RTC by an hour, and the GOAL is that if we stop
> the guest and then restart it later, the restarted guest will resume
> with bios at the same offset from UTC as what it last had (that is, the
> freshly-booted guest will behave as if bios has silently advanced by the
> amount of time that the guest was not running, similar to how bare-metal
> hardware has a battery powering the RTC even when the box is completely
> off and unplugged).  If, during the downtime, the guest has crossed a
> daylight savings boundary, we don't care - the guest OS upon boot will
> recognize that it is using localtime bios, and that it missed a dst
> boundary while the machine was powered off, and so it will presumably do
> an RTC update by an hour fairly early in its boot process - which will
> be caught as an RTC update event so we once again know the new offset to
> be applied the next time we boot the guest.

All good up to here.

>   For this to work, a
> persistent guest definition needs to record the offset that was in use
> at the time the guest powered off, and the next time qemu is started,
> pass _that_ offset as the command-line offset against UTC. 

but this part has never been there. We only modify the adjustment in the
live XML, never in the persistent XML.

>  In libvirt's
> case, we intentionally run qemu to stay alive even after the guest
> quits, and send an event that the guest is no longer running; this event
> could be our trigger to read the qemu offset and adjust the persistent
> state to track the new offset to use on the next boot of the guest.
>
> More interesting is migration (whether by saving to a file or by
> migration between hosts).  Libvirt is able to query the current qemu
> offset prior to starting the migration, but what happens if the guest
> changes the RTC on the source after the time that libvirt did the query
> but before the guest is paused in preparation for the destination to
> start running the guest?

Yes - that is one of my questions.

>   Is the RTC offset transmitted as part of the
> migration stream, even if it is a different offset at the time the
> migration finally converges than what it was when the migration started?

My uninformed guess has been that even if the RTC is part of the
migration stream, that it will be overridden by what is put on the qemu
commandline on the destination host.

>  Furthermore, what command line offset should libvirt tell the
> destination machine to use, since qemu is only tracking the RTC offset
> relative to the command line, and not the offset relative to UTC?  Is
> there a race here?
>
> More concretely, suppose I start a guest with a command-line offset of 3
> hours.  Then the guest does an RTC change to 4 hours (an offset of one
> hour from the command line).  Then I want to do a migration - do I tell
> the destination qemu to start with a 3 hour offset (identical to what
> the source had) and the migration stream corrects it so that the
> destination picks up with RTC already at 4 hours (and the guest sees no
> discontinuity)? Or does libvirt have to query the current offset at the
> time it gets ready to start the migration, and start the destination
> with a command-line offset of 4 hours, because the offset is not
> migrated?  If the latter, what happens if the guest changes RTC in
> between when libvirt queries the source for the value to give to the
> destination, vs. actually starting the migration?  It sounds like the
> only sane way for migration to work is for the RTC offset to be part of
> the stream, and that the destination must be started with the SAME
> command line as the source; and therefore, the only time the command
> line should be altered is when the guest is shut down (not for live
> migration) - where libvirt must update the domain XML to track the
> offset in use at the time the guest shuts down.

But of course this isn't how it works now. So whatever we do is either
wrong now, or wrong in the future.

>
> Meanwhile, I know we also recently added the qemu-guest-agent command to
> set time, with two modes: 1. tell the guest what UTC time it should be
> using, and update RTC to match (which will trigger an RTC change event;
> and perhaps can be used to snoop whether the guest is setting RTC to a
> localtime rather than UTC time); 2. tell the guest to re-read the RTC
> and adjust its local time to match (which pre-supposes that RTC is
> accurate).  The idea is that after management does something where the
> guest has a long downtime (either migration to file, or a long suspend),
> the guest's internal notion of time did not advance (because the CPU was
> not running) while the RTC did advance (because qemu is still exposing
> RTC relative to the host's UTC), so the management will tell the guest
> to resync time as part of resuming.
>
> Now, what good does the RTC change event do, if libvirt really only
> needs to query the current RTC offset at the time shuts down?  Given
> that the guest-agent command deals in UTC (when setting time), does the
> management app need to know when the guest changes RTC due to daylight
> savings?  Or is the case where management tells guest to reset its own
> time by re-reading RTC matter for management to know whether that RTC is
> not tracking UTC?
>
> Also, it seems like most people want their guests to run with a proper
> notion of current time, even over gaps of time where the guest is not
> running (true if the guest is connected to a network and expects to do
> anything where being in sync with other computers is important).  But is
> there ever someone that wants a guest to run as though it were seeing
> all possible time values, and thus where every time the guest is paused
> and then resumed, its offset from UTC is increased by the amount of
> downtime that elapsed (most likely, for a standalone guest with no
> emulated network connections)?  Do we need to do anything different for
> a management app trying to run a guest in such a mode?
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/>
  2014-05-23 13:35         ` Markus Armbruster
  2014-05-23 13:48           ` Marcelo Tosatti
@ 2014-05-23 21:36           ` Paolo Bonzini
  1 sibling, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2014-05-23 21:36 UTC (permalink / raw)
  To: Markus Armbruster, Luiz Capitulino
  Cc: libvir-list, Marcelo Tosatti, qemu-devel@nongnu.org, Laine Stump

Il 23/05/2014 15:35, Markus Armbruster ha scritto:
> Luiz Capitulino <lcapitulino@redhat.com> writes:
>
>> On Fri, 23 May 2014 00:50:38 -0300
>> Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>
>>>> Then the guest triggers an RTC update, so qemu sends an event, but the
>>>> event is lost. Then libvirtd starts again, and doesn't realize the
>>>> event is lost.
>>>
>>> Yes, but that case is also true for any other QMP asynchronous event,
>>> and therefore should be handled generically i suppose (QMP channel data
>>> should be maintained across libvirtd shutdown). Luiz?
>>
>> Maintaining QMP channel data doesn't solve this problem, because all sorts
>> of race conditions are still possible. For example, libvirt could crash
>> after having received the event but before handling it.
>>
>> The most reliable way we found to solve this problem, and that's what we
>> do for other events, is to allow libvirt to query the information the event
>> is reporting. An event is nothing more than a state change in QEMU, and QEMU
>> state is persistent during the life time of the VM, so we allow libvirt to
>> query the state of anything that may send an event.
>
> In fact, this is a general rule: when libvirt tracks an event, it also
> needs a way to poll for the information in the event.
>

It can be polled even right now.  It's not pretty, but it's doable.

You can get the current time via the qom-get command, and then follow 
the same algorithm as QEMU:

     time_t seconds;

     if (rtc_date_offset == -1) {
         if (rtc_utc) {
             seconds = mktimegm(tm);
         } else {
             struct tm tmp = *tm;
             tmp.tm_isdst = -1; /* use timezone to figure it out */
             seconds = mktime(&tmp);
         }
     } else {
         seconds = mktimegm(tm) + rtc_date_offset;
     }
     return seconds - time(NULL);

Unfortunately the QOM path to the RTC device is not stable.  We can add 
a /machine/rtc link, and if the PPC guys implement the link and 
current-time property as well, the same mechanism can work for any board.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-05-23 21:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1400756850-19807-1-git-send-email-laine@laine.org>
     [not found] ` <1400756850-19807-4-git-send-email-laine@laine.org>
2014-05-22 19:33   ` [Qemu-devel] [libvirt] [PATCHv2 3/4] qemu: fix RTC_CHANGE event for <clock offset='variable' basis='utc'/> Eric Blake
2014-05-23  3:50     ` Marcelo Tosatti
2014-05-23  9:17       ` Laine Stump
2014-05-23 10:19         ` Laine Stump
2014-05-23 14:54           ` Eric Blake
2014-05-23 16:42             ` Marcelo Tosatti
2014-05-23 17:56             ` Eric Blake
2014-05-23 18:31             ` Laine Stump
2014-05-23 12:43       ` Luiz Capitulino
2014-05-23 13:35         ` Markus Armbruster
2014-05-23 13:48           ` Marcelo Tosatti
2014-05-23 13:54             ` Marcelo Tosatti
2014-05-23 13:54             ` Luiz Capitulino
2014-05-23 13:56             ` Daniel P. Berrange
2014-05-23 14:04             ` Eric Blake
2014-05-23 21:36           ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).