[Qemu-devel] Rethinking missed tick catchup

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Rethinking missed tick catchup
@ 2012-09-12 13:54 Anthony Liguori
  2012-09-12 14:21 ` Jan Kiszka
                   ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Anthony Liguori @ 2012-09-12 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, Luiz Capitulino,
	Avi Kivity, Paolo Bonzini, Eric Blake

Hi,

We've been running into a lot of problems lately with Windows guests and
I think they all ultimately could be addressed by revisiting the missed
tick catchup algorithms that we use.  Mike and I spent a while talking
about it yesterday and I wanted to take the discussion to the list to
get some additional input.

Here are the problems we're seeing:

1) Rapid reinjection can lead to time moving faster for short bursts of
   time.  We've seen a number of RTC watchdog BSoDs and it's possible
   that at least one cause is reinjection speed.

2) When hibernating a host system, the guest gets is essentially paused
   for a long period of time.  This results in a very large tick catchup
   while also resulting in a large skew in guest time.

   I've gotten reports of the tick catchup consuming a lot of CPU time
   from rapid delivery of interrupts (although I haven't reproduced this
   yet).

3) Windows appears to have a service that periodically syncs the guest
   time with the hardware clock.  I've been told the resync period is an
   hour.  For large clock skews, this can compete with reinjection
   resulting in a positive skew in time (the guest can be ahead of the
   host).

I've been thinking about an algorithm like this to address these
problems:

A) Limit the number of interrupts that we reinject to the equivalent of
   a small period of wallclock time.  Something like 60 seconds.

B) In the event of (A), trigger a notification in QEMU.  This is easy
   for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
   revisit usage of the in-kernel PIT?

C) On acculumated tick overflow, rely on using a qemu-ga command to
   force a resync of the guest's time to the hardware wallclock time.

D) Whenever the guest reads the wallclock time from the RTC, reset all
   accumulated ticks.

In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
discussed a low-impact way of doing this (having a separate dispatch
path for guest agent commands) and I'm confident we could do this for
1.3.

This would mean that management tools would need to consume qemu-ga
through QMP.  Not sure if this is a problem for anyone.

I'm not sure whether it's worth trying to support this with the
in-kernel PIT or not either.

Are there other issues with reinjection that people are aware of?  Does
anything seem obviously wrong with the above?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 13:54 [Qemu-devel] Rethinking missed tick catchup Anthony Liguori
@ 2012-09-12 14:21 ` Jan Kiszka
  2012-09-12 14:44   ` Anthony Liguori
  2012-09-12 15:15 ` Gleb Natapov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2012-09-12 14:21 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael Roth, Gleb Natapov, qemu-devel@nongnu.org,
	Luiz Capitulino, Avi Kivity, Paolo Bonzini, Eric Blake

On 2012-09-12 15:54, Anthony Liguori wrote:
> 
> Hi,
> 
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
> 
> Here are the problems we're seeing:
> 
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
>    that at least one cause is reinjection speed.
> 
> 2) When hibernating a host system, the guest gets is essentially paused
>    for a long period of time.  This results in a very large tick catchup
>    while also resulting in a large skew in guest time.
> 
>    I've gotten reports of the tick catchup consuming a lot of CPU time
>    from rapid delivery of interrupts (although I haven't reproduced this
>    yet).
> 
> 3) Windows appears to have a service that periodically syncs the guest
>    time with the hardware clock.  I've been told the resync period is an
>    hour.  For large clock skews, this can compete with reinjection
>    resulting in a positive skew in time (the guest can be ahead of the
>    host).
> 
> I've been thinking about an algorithm like this to address these
> problems:
> 
> A) Limit the number of interrupts that we reinject to the equivalent of
>    a small period of wallclock time.  Something like 60 seconds.
> 
> B) In the event of (A), trigger a notification in QEMU.  This is easy
>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>    revisit usage of the in-kernel PIT?
> 
> C) On acculumated tick overflow, rely on using a qemu-ga command to
>    force a resync of the guest's time to the hardware wallclock time.
> 
> D) Whenever the guest reads the wallclock time from the RTC, reset all
>    accumulated ticks.
> 
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.
> 
> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.
> 
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.

As with our current discussion around fixing the PIC and its impact on
the PIT, we should try on the userspace model first and then check if
the design can be adapted to support in-kernel as well.

For which guests is the PIT important again? Old Linux kernels? Windows
should be mostly happy with the RTC - or the HPET.

> 
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?

We should take the chance and design everything in a way that the HPET
can finally be (left) enabled.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 14:21 ` Jan Kiszka
@ 2012-09-12 14:44   ` Anthony Liguori
  2012-09-12 14:50     ` Jan Kiszka
  2012-09-12 15:06     ` Gleb Natapov
  0 siblings, 2 replies; 48+ messages in thread
From: Anthony Liguori @ 2012-09-12 14:44 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Michael Roth, Gleb Natapov, qemu-devel@nongnu.org,
	Luiz Capitulino, Avi Kivity, Paolo Bonzini, Eric Blake

Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 2012-09-12 15:54, Anthony Liguori wrote:
>> 
>> Hi,
>> 
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>> 
>> Here are the problems we're seeing:
>> 
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>    that at least one cause is reinjection speed.
>> 
>> 2) When hibernating a host system, the guest gets is essentially paused
>>    for a long period of time.  This results in a very large tick catchup
>>    while also resulting in a large skew in guest time.
>> 
>>    I've gotten reports of the tick catchup consuming a lot of CPU time
>>    from rapid delivery of interrupts (although I haven't reproduced this
>>    yet).
>> 
>> 3) Windows appears to have a service that periodically syncs the guest
>>    time with the hardware clock.  I've been told the resync period is an
>>    hour.  For large clock skews, this can compete with reinjection
>>    resulting in a positive skew in time (the guest can be ahead of the
>>    host).
>> 
>> I've been thinking about an algorithm like this to address these
>> problems:
>> 
>> A) Limit the number of interrupts that we reinject to the equivalent of
>>    a small period of wallclock time.  Something like 60 seconds.
>> 
>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>    revisit usage of the in-kernel PIT?
>> 
>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>    force a resync of the guest's time to the hardware wallclock time.
>> 
>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>    accumulated ticks.
>> 
>> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
>> discussed a low-impact way of doing this (having a separate dispatch
>> path for guest agent commands) and I'm confident we could do this for
>> 1.3.
>> 
>> This would mean that management tools would need to consume qemu-ga
>> through QMP.  Not sure if this is a problem for anyone.
>> 
>> I'm not sure whether it's worth trying to support this with the
>> in-kernel PIT or not either.
>
> As with our current discussion around fixing the PIC and its impact on
> the PIT, we should try on the userspace model first and then check if
> the design can be adapted to support in-kernel as well.
>
> For which guests is the PIT important again? Old Linux kernels? Windows
> should be mostly happy with the RTC - or the HPET.

I thought that only 64-bit Win2k8+ used the RTC.

I thought win2k3 and even 32-bit win2k8 still used the PIT.

>> Are there other issues with reinjection that people are aware of?  Does
>> anything seem obviously wrong with the above?
>
> We should take the chance and design everything in a way that the HPET
> can finally be (left) enabled.

I thought the issue with the HPET was access frequency and the cost of
heavy weight exits.

I don't have concrete data here.  I've only heard it second hand.  Can
anyone comment more?

Regards,

Anthony Liguori

>
> Jan
>
> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 14:44   ` Anthony Liguori
@ 2012-09-12 14:50     ` Jan Kiszka
  2012-09-12 15:06     ` Gleb Natapov
  1 sibling, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2012-09-12 14:50 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael Roth, Gleb Natapov, qemu-devel@nongnu.org,
	Luiz Capitulino, Avi Kivity, Paolo Bonzini, Eric Blake

On 2012-09-12 16:44, Anthony Liguori wrote:
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 2012-09-12 15:54, Anthony Liguori wrote:
>>>
>>> Hi,
>>>
>>> We've been running into a lot of problems lately with Windows guests and
>>> I think they all ultimately could be addressed by revisiting the missed
>>> tick catchup algorithms that we use.  Mike and I spent a while talking
>>> about it yesterday and I wanted to take the discussion to the list to
>>> get some additional input.
>>>
>>> Here are the problems we're seeing:
>>>
>>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>>    that at least one cause is reinjection speed.
>>>
>>> 2) When hibernating a host system, the guest gets is essentially paused
>>>    for a long period of time.  This results in a very large tick catchup
>>>    while also resulting in a large skew in guest time.
>>>
>>>    I've gotten reports of the tick catchup consuming a lot of CPU time
>>>    from rapid delivery of interrupts (although I haven't reproduced this
>>>    yet).
>>>
>>> 3) Windows appears to have a service that periodically syncs the guest
>>>    time with the hardware clock.  I've been told the resync period is an
>>>    hour.  For large clock skews, this can compete with reinjection
>>>    resulting in a positive skew in time (the guest can be ahead of the
>>>    host).
>>>
>>> I've been thinking about an algorithm like this to address these
>>> problems:
>>>
>>> A) Limit the number of interrupts that we reinject to the equivalent of
>>>    a small period of wallclock time.  Something like 60 seconds.
>>>
>>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>>    revisit usage of the in-kernel PIT?
>>>
>>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>>    force a resync of the guest's time to the hardware wallclock time.
>>>
>>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>>    accumulated ticks.
>>>
>>> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
>>> discussed a low-impact way of doing this (having a separate dispatch
>>> path for guest agent commands) and I'm confident we could do this for
>>> 1.3.
>>>
>>> This would mean that management tools would need to consume qemu-ga
>>> through QMP.  Not sure if this is a problem for anyone.
>>>
>>> I'm not sure whether it's worth trying to support this with the
>>> in-kernel PIT or not either.
>>
>> As with our current discussion around fixing the PIC and its impact on
>> the PIT, we should try on the userspace model first and then check if
>> the design can be adapted to support in-kernel as well.
>>
>> For which guests is the PIT important again? Old Linux kernels? Windows
>> should be mostly happy with the RTC - or the HPET.
> 
> I thought that only 64-bit Win2k8+ used the RTC.
> 
> I thought win2k3 and even 32-bit win2k8 still used the PIT.

Hmm, might be true.

> 
>>> Are there other issues with reinjection that people are aware of?  Does
>>> anything seem obviously wrong with the above?
>>
>> We should take the chance and design everything in a way that the HPET
>> can finally be (left) enabled.
> 
> I thought the issue with the HPET was access frequency and the cost of
> heavy weight exits.

Well, with common Win7-64 you can choose between RTC and HPET. There
former works well, the latter breaks timing. Both require userspace
exists. But HPET is enabled by default, so needs manual tuning to get
things right.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 14:44   ` Anthony Liguori
  2012-09-12 14:50     ` Jan Kiszka
@ 2012-09-12 15:06     ` Gleb Natapov
  2012-09-12 15:42       ` Jan Kiszka
  2012-09-12 16:16       ` Gleb Natapov
  1 sibling, 2 replies; 48+ messages in thread
From: Gleb Natapov @ 2012-09-12 15:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael Roth, Jan Kiszka, qemu-devel@nongnu.org, Luiz Capitulino,
	Avi Kivity, Paolo Bonzini, Eric Blake

On Wed, Sep 12, 2012 at 09:44:10AM -0500, Anthony Liguori wrote:
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
> > On 2012-09-12 15:54, Anthony Liguori wrote:
> >> 
> >> Hi,
> >> 
> >> We've been running into a lot of problems lately with Windows guests and
> >> I think they all ultimately could be addressed by revisiting the missed
> >> tick catchup algorithms that we use.  Mike and I spent a while talking
> >> about it yesterday and I wanted to take the discussion to the list to
> >> get some additional input.
> >> 
> >> Here are the problems we're seeing:
> >> 
> >> 1) Rapid reinjection can lead to time moving faster for short bursts of
> >>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>    that at least one cause is reinjection speed.
> >> 
> >> 2) When hibernating a host system, the guest gets is essentially paused
> >>    for a long period of time.  This results in a very large tick catchup
> >>    while also resulting in a large skew in guest time.
> >> 
> >>    I've gotten reports of the tick catchup consuming a lot of CPU time
> >>    from rapid delivery of interrupts (although I haven't reproduced this
> >>    yet).
> >> 
> >> 3) Windows appears to have a service that periodically syncs the guest
> >>    time with the hardware clock.  I've been told the resync period is an
> >>    hour.  For large clock skews, this can compete with reinjection
> >>    resulting in a positive skew in time (the guest can be ahead of the
> >>    host).
> >> 
> >> I've been thinking about an algorithm like this to address these
> >> problems:
> >> 
> >> A) Limit the number of interrupts that we reinject to the equivalent of
> >>    a small period of wallclock time.  Something like 60 seconds.
> >> 
> >> B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >>    revisit usage of the in-kernel PIT?
> >> 
> >> C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>    force a resync of the guest's time to the hardware wallclock time.
> >> 
> >> D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>    accumulated ticks.
> >> 
> >> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> >> discussed a low-impact way of doing this (having a separate dispatch
> >> path for guest agent commands) and I'm confident we could do this for
> >> 1.3.
> >> 
> >> This would mean that management tools would need to consume qemu-ga
> >> through QMP.  Not sure if this is a problem for anyone.
> >> 
> >> I'm not sure whether it's worth trying to support this with the
> >> in-kernel PIT or not either.
> >
> > As with our current discussion around fixing the PIC and its impact on
> > the PIT, we should try on the userspace model first and then check if
> > the design can be adapted to support in-kernel as well.
> >
> > For which guests is the PIT important again? Old Linux kernels? Windows
> > should be mostly happy with the RTC - or the HPET.
> 
> I thought that only 64-bit Win2k8+ used the RTC.
> 
> I thought win2k3 and even 32-bit win2k8 still used the PIT.
> 
Only WindowsXP non-acpi hal uses PIT. Any other windows uses RTC. In
other words we do not care about PIT.

> >> Are there other issues with reinjection that people are aware of?  Does
> >> anything seem obviously wrong with the above?
> >
> > We should take the chance and design everything in a way that the HPET
> > can finally be (left) enabled.
> 
> I thought the issue with the HPET was access frequency and the cost of
> heavy weight exits.
> 
> I don't have concrete data here.  I've only heard it second hand.  Can
> anyone comment more?
> 
There is no any reason whatsoever to emulate HPET for Windows. It will
make it slower. Hyper-V does not emulate it. For proper time support in
Windows we need to implement relevant part of Hyper-V spec.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 13:54 [Qemu-devel] Rethinking missed tick catchup Anthony Liguori
  2012-09-12 14:21 ` Jan Kiszka
@ 2012-09-12 15:15 ` Gleb Natapov
  2012-09-12 18:19   ` Anthony Liguori
  2012-09-12 16:27 ` Stefan Weil
  2012-09-12 17:23 ` Luiz Capitulino
  3 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2012-09-12 15:15 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael Roth, Jan Kiszka, qemu-devel, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

On Wed, Sep 12, 2012 at 08:54:26AM -0500, Anthony Liguori wrote:
> 
> Hi,
> 
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
> 
> Here are the problems we're seeing:
> 
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
>    that at least one cause is reinjection speed.
> 
> 2) When hibernating a host system, the guest gets is essentially paused
>    for a long period of time.  This results in a very large tick catchup
>    while also resulting in a large skew in guest time.
> 
>    I've gotten reports of the tick catchup consuming a lot of CPU time
>    from rapid delivery of interrupts (although I haven't reproduced this
>    yet).
> 
> 3) Windows appears to have a service that periodically syncs the guest
>    time with the hardware clock.  I've been told the resync period is an
>    hour.  For large clock skews, this can compete with reinjection
>    resulting in a positive skew in time (the guest can be ahead of the
>    host).
> 
> I've been thinking about an algorithm like this to address these
> problems:
> 
> A) Limit the number of interrupts that we reinject to the equivalent of
>    a small period of wallclock time.  Something like 60 seconds.
> 
How this will fix BSOD problem for instance? 60 seconds is long enough
to cause all the problem you are talking about above. We can make
amount of accumulated ticks easily configurable though to play with and
see.

> B) In the event of (A), trigger a notification in QEMU.  This is easy
>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>    revisit usage of the in-kernel PIT?
> 
PIT does not matter for Windows guests.

> C) On acculumated tick overflow, rely on using a qemu-ga command to
>    force a resync of the guest's time to the hardware wallclock time.
> 
Needs guest cooperation.

> D) Whenever the guest reads the wallclock time from the RTC, reset all
>    accumulated ticks.
>
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.
> 
> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.
> 
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.
> 
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?
> 
It looks like you are trying to solve only pathologically big timedrift
problems. Those do not happen normally.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 15:06     ` Gleb Natapov
@ 2012-09-12 15:42       ` Jan Kiszka
  2012-09-12 15:45         ` Gleb Natapov
  2012-09-12 16:16       ` Gleb Natapov
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2012-09-12 15:42 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Michael Roth, qemu-devel@nongnu.org, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini, Eric Blake

On 2012-09-12 17:06, Gleb Natapov wrote:
>>>> Are there other issues with reinjection that people are aware of?  Does
>>>> anything seem obviously wrong with the above?
>>>
>>> We should take the chance and design everything in a way that the HPET
>>> can finally be (left) enabled.
>>
>> I thought the issue with the HPET was access frequency and the cost of
>> heavy weight exits.
>>
>> I don't have concrete data here.  I've only heard it second hand.  Can
>> anyone comment more?
>>
> There is no any reason whatsoever to emulate HPET for Windows. It will
> make it slower. Hyper-V does not emulate it. For proper time support in
> Windows we need to implement relevant part of Hyper-V spec.

There are two reasons to do it nevertheless:

 - QEMU is not Hyper-V. We are emulating the HPET already, and we
   expose it by default. So we should do it properly.

 - The time drift fix for the RTC is still a hack. Adding a second user
   would force us to finally clean it up.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 15:42       ` Jan Kiszka
@ 2012-09-12 15:45         ` Gleb Natapov
  0 siblings, 0 replies; 48+ messages in thread
From: Gleb Natapov @ 2012-09-12 15:45 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Michael Roth, qemu-devel@nongnu.org, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini, Eric Blake

On Wed, Sep 12, 2012 at 05:42:58PM +0200, Jan Kiszka wrote:
> On 2012-09-12 17:06, Gleb Natapov wrote:
> >>>> Are there other issues with reinjection that people are aware of?  Does
> >>>> anything seem obviously wrong with the above?
> >>>
> >>> We should take the chance and design everything in a way that the HPET
> >>> can finally be (left) enabled.
> >>
> >> I thought the issue with the HPET was access frequency and the cost of
> >> heavy weight exits.
> >>
> >> I don't have concrete data here.  I've only heard it second hand.  Can
> >> anyone comment more?
> >>
> > There is no any reason whatsoever to emulate HPET for Windows. It will
> > make it slower. Hyper-V does not emulate it. For proper time support in
> > Windows we need to implement relevant part of Hyper-V spec.
> 
> There are two reasons to do it nevertheless:
> 
>  - QEMU is not Hyper-V. We are emulating the HPET already, and we
>    expose it by default. So we should do it properly.
> 
>  - The time drift fix for the RTC is still a hack. Adding a second user
>    would force us to finally clean it up.
> 
I am not saying we should not emulate HPET in QEMU,  I am saying there
is not reason to emulate it for Windows :)

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 15:06     ` Gleb Natapov
  2012-09-12 15:42       ` Jan Kiszka
@ 2012-09-12 16:16       ` Gleb Natapov
  1 sibling, 0 replies; 48+ messages in thread
From: Gleb Natapov @ 2012-09-12 16:16 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel@nongnu.org, Jan Kiszka, Michael Roth, Luiz Capitulino,
	Avi Kivity, Paolo Bonzini, Eric Blake

On Wed, Sep 12, 2012 at 06:06:47PM +0300, Gleb Natapov wrote:
> On Wed, Sep 12, 2012 at 09:44:10AM -0500, Anthony Liguori wrote:
> > Jan Kiszka <jan.kiszka@siemens.com> writes:
> > 
> > > On 2012-09-12 15:54, Anthony Liguori wrote:
> > >> 
> > >> Hi,
> > >> 
> > >> We've been running into a lot of problems lately with Windows guests and
> > >> I think they all ultimately could be addressed by revisiting the missed
> > >> tick catchup algorithms that we use.  Mike and I spent a while talking
> > >> about it yesterday and I wanted to take the discussion to the list to
> > >> get some additional input.
> > >> 
> > >> Here are the problems we're seeing:
> > >> 
> > >> 1) Rapid reinjection can lead to time moving faster for short bursts of
> > >>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
> > >>    that at least one cause is reinjection speed.
> > >> 
> > >> 2) When hibernating a host system, the guest gets is essentially paused
> > >>    for a long period of time.  This results in a very large tick catchup
> > >>    while also resulting in a large skew in guest time.
> > >> 
> > >>    I've gotten reports of the tick catchup consuming a lot of CPU time
> > >>    from rapid delivery of interrupts (although I haven't reproduced this
> > >>    yet).
> > >> 
> > >> 3) Windows appears to have a service that periodically syncs the guest
> > >>    time with the hardware clock.  I've been told the resync period is an
> > >>    hour.  For large clock skews, this can compete with reinjection
> > >>    resulting in a positive skew in time (the guest can be ahead of the
> > >>    host).
> > >> 
> > >> I've been thinking about an algorithm like this to address these
> > >> problems:
> > >> 
> > >> A) Limit the number of interrupts that we reinject to the equivalent of
> > >>    a small period of wallclock time.  Something like 60 seconds.
> > >> 
> > >> B) In the event of (A), trigger a notification in QEMU.  This is easy
> > >>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> > >>    revisit usage of the in-kernel PIT?
> > >> 
> > >> C) On acculumated tick overflow, rely on using a qemu-ga command to
> > >>    force a resync of the guest's time to the hardware wallclock time.
> > >> 
> > >> D) Whenever the guest reads the wallclock time from the RTC, reset all
> > >>    accumulated ticks.
> > >> 
> > >> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> > >> discussed a low-impact way of doing this (having a separate dispatch
> > >> path for guest agent commands) and I'm confident we could do this for
> > >> 1.3.
> > >> 
> > >> This would mean that management tools would need to consume qemu-ga
> > >> through QMP.  Not sure if this is a problem for anyone.
> > >> 
> > >> I'm not sure whether it's worth trying to support this with the
> > >> in-kernel PIT or not either.
> > >
> > > As with our current discussion around fixing the PIC and its impact on
> > > the PIT, we should try on the userspace model first and then check if
> > > the design can be adapted to support in-kernel as well.
> > >
> > > For which guests is the PIT important again? Old Linux kernels? Windows
> > > should be mostly happy with the RTC - or the HPET.
> > 
> > I thought that only 64-bit Win2k8+ used the RTC.
> > 
> > I thought win2k3 and even 32-bit win2k8 still used the PIT.
> > 
> Only WindowsXP non-acpi hal uses PIT. Any other windows uses RTC. In
> other words we do not care about PIT.
> 
Small clarification. They use RTC if HPET is not present. I don't know
at what version Windows started to prefer HPET over RTC.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 13:54 [Qemu-devel] Rethinking missed tick catchup Anthony Liguori
  2012-09-12 14:21 ` Jan Kiszka
  2012-09-12 15:15 ` Gleb Natapov
@ 2012-09-12 16:27 ` Stefan Weil
  2012-09-12 16:45   ` Gleb Natapov
  2012-09-12 17:23 ` Luiz Capitulino
  3 siblings, 1 reply; 48+ messages in thread
From: Stefan Weil @ 2012-09-12 16:27 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael Roth, Gleb Natapov, Jan Kiszka, qemu-devel,
	Luiz Capitulino, Avi Kivity, Paolo Bonzini, Eric Blake

Am 12.09.2012 15:54, schrieb Anthony Liguori:
>
> Hi,
>
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
>
> Here are the problems we're seeing:
>
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>     time.  We've seen a number of RTC watchdog BSoDs and it's possible
>     that at least one cause is reinjection speed.
>
> 2) When hibernating a host system, the guest gets is essentially paused
>     for a long period of time.  This results in a very large tick catchup
>     while also resulting in a large skew in guest time.
>
>     I've gotten reports of the tick catchup consuming a lot of CPU time
>     from rapid delivery of interrupts (although I haven't reproduced this
>     yet).
>
> 3) Windows appears to have a service that periodically syncs the guest
>     time with the hardware clock.  I've been told the resync period is an
>     hour.  For large clock skews, this can compete with reinjection
>     resulting in a positive skew in time (the guest can be ahead of the
>     host).

Nearly each modern OS (including Windows) uses NTP
or some other protocol to get the time via a TCP network.

If a guest OS detects a small difference of time, it will usually
accelerate or decelerate the OS clock until the time is
synchronised again.

Large jumps in network time will make the OS time jump, too.
With a little bad luck, QEMU's reinjection will add the
positive skew, no matter whether the guest is Linux or Windows.

>
> I've been thinking about an algorithm like this to address these
> problems:
>
> A) Limit the number of interrupts that we reinject to the equivalent of
>     a small period of wallclock time.  Something like 60 seconds.
>
> B) In the event of (A), trigger a notification in QEMU.  This is easy
>     for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>     revisit usage of the in-kernel PIT?
>
> C) On acculumated tick overflow, rely on using a qemu-ga command to
>     force a resync of the guest's time to the hardware wallclock time.
>
> D) Whenever the guest reads the wallclock time from the RTC, reset all
>     accumulated ticks.

D) makes no sense, see my comment above.

Injection of additional timer interrupts should not be needed
after a hibernation. The guest must handle that situation
by reading either the hw clock (which must be updated
by QEMU when it resumes from hibernate) or by using
another time reference (like NTP, for example).

>
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.
>
> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.
>
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.
>
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?
>
> Regards,
>
> Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 16:27 ` Stefan Weil
@ 2012-09-12 16:45   ` Gleb Natapov
  2012-09-12 17:30     ` Stefan Weil
  0 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2012-09-12 16:45 UTC (permalink / raw)
  To: Stefan Weil
  Cc: Michael Roth, Jan Kiszka, qemu-devel, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini, Eric Blake

On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >
> >Hi,
> >
> >We've been running into a lot of problems lately with Windows guests and
> >I think they all ultimately could be addressed by revisiting the missed
> >tick catchup algorithms that we use.  Mike and I spent a while talking
> >about it yesterday and I wanted to take the discussion to the list to
> >get some additional input.
> >
> >Here are the problems we're seeing:
> >
> >1) Rapid reinjection can lead to time moving faster for short bursts of
> >    time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >    that at least one cause is reinjection speed.
> >
> >2) When hibernating a host system, the guest gets is essentially paused
> >    for a long period of time.  This results in a very large tick catchup
> >    while also resulting in a large skew in guest time.
> >
> >    I've gotten reports of the tick catchup consuming a lot of CPU time
> >    from rapid delivery of interrupts (although I haven't reproduced this
> >    yet).
> >
> >3) Windows appears to have a service that periodically syncs the guest
> >    time with the hardware clock.  I've been told the resync period is an
> >    hour.  For large clock skews, this can compete with reinjection
> >    resulting in a positive skew in time (the guest can be ahead of the
> >    host).
> 
> Nearly each modern OS (including Windows) uses NTP
> or some other protocol to get the time via a TCP network.
> 
The drifts we are talking about will take ages for NTP to fix.

> If a guest OS detects a small difference of time, it will usually
> accelerate or decelerate the OS clock until the time is
> synchronised again.
> 
> Large jumps in network time will make the OS time jump, too.
> With a little bad luck, QEMU's reinjection will add the
> positive skew, no matter whether the guest is Linux or Windows.
> 
As far as I know NTP will never make OS clock jump. The purpose of NTP
is to fix time gradually, so apps will not notice. npdate is used to
force clock synchronization, but is should be run manually.

> >
> >I've been thinking about an algorithm like this to address these
> >problems:
> >
> >A) Limit the number of interrupts that we reinject to the equivalent of
> >    a small period of wallclock time.  Something like 60 seconds.
> >
> >B) In the event of (A), trigger a notification in QEMU.  This is easy
> >    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >    revisit usage of the in-kernel PIT?
> >
> >C) On acculumated tick overflow, rely on using a qemu-ga command to
> >    force a resync of the guest's time to the hardware wallclock time.
> >
> >D) Whenever the guest reads the wallclock time from the RTC, reset all
> >    accumulated ticks.
> 
> D) makes no sense, see my comment above.
> 
> Injection of additional timer interrupts should not be needed
> after a hibernation. The guest must handle that situation
> by reading either the hw clock (which must be updated
> by QEMU when it resumes from hibernate) or by using
> another time reference (like NTP, for example).
> 
He is talking about host hibernation, not guest.

> >
> >In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> >discussed a low-impact way of doing this (having a separate dispatch
> >path for guest agent commands) and I'm confident we could do this for
> >1.3.
> >
> >This would mean that management tools would need to consume qemu-ga
> >through QMP.  Not sure if this is a problem for anyone.
> >
> >I'm not sure whether it's worth trying to support this with the
> >in-kernel PIT or not either.
> >
> >Are there other issues with reinjection that people are aware of?  Does
> >anything seem obviously wrong with the above?
> >
> >Regards,
> >
> >Anthony Liguori

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 13:54 [Qemu-devel] Rethinking missed tick catchup Anthony Liguori
                   ` (2 preceding siblings ...)
  2012-09-12 16:27 ` Stefan Weil
@ 2012-09-12 17:23 ` Luiz Capitulino
  3 siblings, 0 replies; 48+ messages in thread
From: Luiz Capitulino @ 2012-09-12 17:23 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, Jan Kiszka, qemu-devel, Michael Roth, Avi Kivity,
	Paolo Bonzini, Eric Blake

On Wed, 12 Sep 2012 08:54:26 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> 
> Hi,
> 
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
> 
> Here are the problems we're seeing:
> 
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
>    that at least one cause is reinjection speed.
> 
> 2) When hibernating a host system, the guest gets is essentially paused
>    for a long period of time.  This results in a very large tick catchup
>    while also resulting in a large skew in guest time.
> 
>    I've gotten reports of the tick catchup consuming a lot of CPU time
>    from rapid delivery of interrupts (although I haven't reproduced this
>    yet).
> 
> 3) Windows appears to have a service that periodically syncs the guest
>    time with the hardware clock.  I've been told the resync period is an
>    hour.  For large clock skews, this can compete with reinjection
>    resulting in a positive skew in time (the guest can be ahead of the
>    host).
> 
> I've been thinking about an algorithm like this to address these
> problems:
> 
> A) Limit the number of interrupts that we reinject to the equivalent of
>    a small period of wallclock time.  Something like 60 seconds.
> 
> B) In the event of (A), trigger a notification in QEMU.  This is easy
>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>    revisit usage of the in-kernel PIT?
> 
> C) On acculumated tick overflow, rely on using a qemu-ga command to
>    force a resync of the guest's time to the hardware wallclock time.
> 
> D) Whenever the guest reads the wallclock time from the RTC, reset all
>    accumulated ticks.
> 
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.

Fine with me, but note that we're only two or three commands away from
having the qapi conversion done. So, it's possible that we'll merge this
and re-do it a few weeks later.

> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.

Shouldn't be a problem I think.

> 
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.
> 
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?
> 
> Regards,
> 
> Anthony Liguori
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 16:45   ` Gleb Natapov
@ 2012-09-12 17:30     ` Stefan Weil
  2012-09-12 18:13       ` Gleb Natapov
  2012-09-12 20:06       ` Michael Roth
  0 siblings, 2 replies; 48+ messages in thread
From: Stefan Weil @ 2012-09-12 17:30 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Michael Roth, Jan Kiszka, qemu-devel, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini, Eric Blake

Am 12.09.2012 18:45, schrieb Gleb Natapov:
> On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
>> Am 12.09.2012 15:54, schrieb Anthony Liguori:
>>> Hi,
>>>
>>> We've been running into a lot of problems lately with Windows guests and
>>> I think they all ultimately could be addressed by revisiting the missed
>>> tick catchup algorithms that we use.  Mike and I spent a while talking
>>> about it yesterday and I wanted to take the discussion to the list to
>>> get some additional input.
>>>
>>> Here are the problems we're seeing:
>>>
>>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>>     time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>>     that at least one cause is reinjection speed.
>>>
>>> 2) When hibernating a host system, the guest gets is essentially paused
>>>     for a long period of time.  This results in a very large tick catchup
>>>     while also resulting in a large skew in guest time.
>>>
>>>     I've gotten reports of the tick catchup consuming a lot of CPU time
>>>     from rapid delivery of interrupts (although I haven't reproduced this
>>>     yet).
>>>
>>> 3) Windows appears to have a service that periodically syncs the guest
>>>     time with the hardware clock.  I've been told the resync period is an
>>>     hour.  For large clock skews, this can compete with reinjection
>>>     resulting in a positive skew in time (the guest can be ahead of the
>>>     host).
>> Nearly each modern OS (including Windows) uses NTP
>> or some other protocol to get the time via a TCP network.
>>
> The drifts we are talking about will take ages for NTP to fix.
>
>> If a guest OS detects a small difference of time, it will usually
>> accelerate or decelerate the OS clock until the time is
>> synchronised again.
>>
>> Large jumps in network time will make the OS time jump, too.
>> With a little bad luck, QEMU's reinjection will add the
>> positive skew, no matter whether the guest is Linux or Windows.
>>
> As far as I know NTP will never make OS clock jump. The purpose of NTP
> is to fix time gradually, so apps will not notice. npdate is used to
> force clock synchronization, but is should be run manually.

s/npdate/ntpdate. Yes, some Linux distros run it at system start,
and it's also usual to call it every hour (poor man's NTP, uses
less resources).

>
>>> I've been thinking about an algorithm like this to address these
>>> problems:
>>>
>>> A) Limit the number of interrupts that we reinject to the equivalent of
>>>     a small period of wallclock time.  Something like 60 seconds.
>>>
>>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>>     for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>>     revisit usage of the in-kernel PIT?
>>>
>>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>>     force a resync of the guest's time to the hardware wallclock time.
>>>
>>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>>     accumulated ticks.
>> D) makes no sense, see my comment above.
>>
>> Injection of additional timer interrupts should not be needed
>> after a hibernation. The guest must handle that situation
>> by reading either the hw clock (which must be updated
>> by QEMU when it resumes from hibernate) or by using
>> another time reference (like NTP, for example).
>>
> He is talking about host hibernation, not guest.
>

I also meant host hibernation.

Maybe the host should tell the guest that it is going to
hibernate (ACPI event), then the guest can use its
normal hibernate entry and recovery code, too.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
@ 2012-09-12 18:03 Clemens Kolbitsch
  2012-09-13  6:25 ` Paolo Bonzini
  0 siblings, 1 reply; 48+ messages in thread
From: Clemens Kolbitsch @ 2012-09-12 18:03 UTC (permalink / raw)
  To: qemu-devel

> On 2012-09-12 15:54, Anthony Liguori wrote:
>>
>> Hi,
>>
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>>
>> Here are the problems we're seeing:
>>
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>    that at least one cause is reinjection speed.
>>
>> 2) When hibernating a host system, the guest gets is essentially paused
>>    for a long period of time.  This results in a very large tick catchup
>>    while also resulting in a large skew in guest time.
>>
>>    I've gotten reports of the tick catchup consuming a lot of CPU time
>>    from rapid delivery of interrupts (although I haven't reproduced this
>>    yet).

Guys,

not much that I can contribute to solving the problem, but I have a
bunch of VMs where this happens _every_ time I resume a snapshot (but
without hibernating). In case this could be a connected problem and
you need help testing a patch, I'm more than happy to help.

-Clemens

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 17:30     ` Stefan Weil
@ 2012-09-12 18:13       ` Gleb Natapov
  2012-09-12 19:45         ` Stefan Weil
  2012-09-12 20:06       ` Michael Roth
  1 sibling, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2012-09-12 18:13 UTC (permalink / raw)
  To: Stefan Weil
  Cc: Michael Roth, Jan Kiszka, qemu-devel, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini, Eric Blake

On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:
> Am 12.09.2012 18:45, schrieb Gleb Natapov:
> >On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> >>Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >>>Hi,
> >>>
> >>>We've been running into a lot of problems lately with Windows guests and
> >>>I think they all ultimately could be addressed by revisiting the missed
> >>>tick catchup algorithms that we use.  Mike and I spent a while talking
> >>>about it yesterday and I wanted to take the discussion to the list to
> >>>get some additional input.
> >>>
> >>>Here are the problems we're seeing:
> >>>
> >>>1) Rapid reinjection can lead to time moving faster for short bursts of
> >>>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>>    that at least one cause is reinjection speed.
> >>>
> >>>2) When hibernating a host system, the guest gets is essentially paused
> >>>    for a long period of time.  This results in a very large tick catchup
> >>>    while also resulting in a large skew in guest time.
> >>>
> >>>    I've gotten reports of the tick catchup consuming a lot of CPU time
> >>>    from rapid delivery of interrupts (although I haven't reproduced this
> >>>    yet).
> >>>
> >>>3) Windows appears to have a service that periodically syncs the guest
> >>>    time with the hardware clock.  I've been told the resync period is an
> >>>    hour.  For large clock skews, this can compete with reinjection
> >>>    resulting in a positive skew in time (the guest can be ahead of the
> >>>    host).
> >>Nearly each modern OS (including Windows) uses NTP
> >>or some other protocol to get the time via a TCP network.
> >>
> >The drifts we are talking about will take ages for NTP to fix.
> >
> >>If a guest OS detects a small difference of time, it will usually
> >>accelerate or decelerate the OS clock until the time is
> >>synchronised again.
> >>
> >>Large jumps in network time will make the OS time jump, too.
> >>With a little bad luck, QEMU's reinjection will add the
> >>positive skew, no matter whether the guest is Linux or Windows.
> >>
> >As far as I know NTP will never make OS clock jump. The purpose of NTP
> >is to fix time gradually, so apps will not notice. npdate is used to
> >force clock synchronization, but is should be run manually.
> 
> s/npdate/ntpdate. Yes, some Linux distros run it at system start,
Yes, typo.

> and it's also usual to call it every hour (poor man's NTP, uses
> less resources).
> 
> >
> >>>I've been thinking about an algorithm like this to address these
> >>>problems:
> >>>
> >>>A) Limit the number of interrupts that we reinject to the equivalent of
> >>>    a small period of wallclock time.  Something like 60 seconds.
> >>>
> >>>B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >>>    revisit usage of the in-kernel PIT?
> >>>
> >>>C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>>    force a resync of the guest's time to the hardware wallclock time.
> >>>
> >>>D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>>    accumulated ticks.
> >>D) makes no sense, see my comment above.
> >>
> >>Injection of additional timer interrupts should not be needed
> >>after a hibernation. The guest must handle that situation
> >>by reading either the hw clock (which must be updated
> >>by QEMU when it resumes from hibernate) or by using
> >>another time reference (like NTP, for example).
> >>
> >He is talking about host hibernation, not guest.
> >
> 
> I also meant host hibernation.
Than I don't see how guest can handle the situation since it has
no idea that it was stopped. Qemu has not idea about host hibernation
either.

> 
> Maybe the host should tell the guest that it is going to
> hibernate (ACPI event), then the guest can use its
> normal hibernate entry and recovery code, too.
Qemu does not emulate Sleep button, but even if it did guest may ignore
it. AFAIK libvirt migrate VM into a file during host hibernation. While
this does not require guest cooperation it have time keeping issues.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 15:15 ` Gleb Natapov
@ 2012-09-12 18:19   ` Anthony Liguori
  2012-09-13 10:49     ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2012-09-12 18:19 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Michael Roth, Jan Kiszka, qemu-devel, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

Gleb Natapov <gleb@redhat.com> writes:

> On Wed, Sep 12, 2012 at 08:54:26AM -0500, Anthony Liguori wrote:
>> 
>> Hi,
>> 
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>> 
>> Here are the problems we're seeing:
>> 
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>    that at least one cause is reinjection speed.
>> 
>> 2) When hibernating a host system, the guest gets is essentially paused
>>    for a long period of time.  This results in a very large tick catchup
>>    while also resulting in a large skew in guest time.
>> 
>>    I've gotten reports of the tick catchup consuming a lot of CPU time
>>    from rapid delivery of interrupts (although I haven't reproduced this
>>    yet).
>> 
>> 3) Windows appears to have a service that periodically syncs the guest
>>    time with the hardware clock.  I've been told the resync period is an
>>    hour.  For large clock skews, this can compete with reinjection
>>    resulting in a positive skew in time (the guest can be ahead of the
>>    host).
>> 
>> I've been thinking about an algorithm like this to address these
>> problems:
>> 
>> A) Limit the number of interrupts that we reinject to the equivalent of
>>    a small period of wallclock time.  Something like 60 seconds.
>> 
> How this will fix BSOD problem for instance? 60 seconds is long enough
> to cause all the problem you are talking about above. We can make
> amount of accumulated ticks easily configurable though to play with and
> see.

It won't, but the goal of an upper limit is to cap time correction at
something reasonably caused by overcommit, not by suspend/resume.

60 seconds is probably way too long.  Maybe 5 seconds?  We can try
various amounts as you said.

What do you think about slowing down the catchup rate?  I think now we
increase wallclock time by 100-700%.

This is very fast.  I wonder if this makes sense anymore since hr timers
are pretty much ubiquitous.

I think we could probably even just increase wallclock time by as little
as 10-20%.  That should avoid false watchdog alerts but still give us a
chance to inject enough interrupts.

>
>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>    revisit usage of the in-kernel PIT?
>> 
> PIT does not matter for Windows guests.
>
>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>    force a resync of the guest's time to the hardware wallclock time.
>> 
> Needs guest cooperation.

Yes, hence qemu-ga.  But is there any other choice?  Hibernation can
cause us to miss an unbounded number of ticks.   Days worth of time.  It
seems unreasonable to gradually catch up that much time.

>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>    accumulated ticks.
>>
>> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
>> discussed a low-impact way of doing this (having a separate dispatch
>> path for guest agent commands) and I'm confident we could do this for
>> 1.3.
>> 
>> This would mean that management tools would need to consume qemu-ga
>> through QMP.  Not sure if this is a problem for anyone.
>> 
>> I'm not sure whether it's worth trying to support this with the
>> in-kernel PIT or not either.
>> 
>> Are there other issues with reinjection that people are aware of?  Does
>> anything seem obviously wrong with the above?
>> 
> It looks like you are trying to solve only pathologically big timedrift
> problems. Those do not happen normally.

They do if you hibernate your laptop.

Regards,

Anthony Liguori

>
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 18:13       ` Gleb Natapov
@ 2012-09-12 19:45         ` Stefan Weil
  2012-09-13 10:50           ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Weil @ 2012-09-12 19:45 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Michael Roth, Jan Kiszka, qemu-devel, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini, Eric Blake

Am 12.09.2012 20:13, schrieb Gleb Natapov:
> On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:
>> I also meant host hibernation.
> Than I don't see how guest can handle the situation since it has
> no idea that it was stopped. Qemu has not idea about host hibernation
> either.

The guest can compare its internal timers (which stop
when the host hibernates) with external time references
(NTP, hw clock or any other clock reference).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 17:30     ` Stefan Weil
  2012-09-12 18:13       ` Gleb Natapov
@ 2012-09-12 20:06       ` Michael Roth
  1 sibling, 0 replies; 48+ messages in thread
From: Michael Roth @ 2012-09-12 20:06 UTC (permalink / raw)
  To: Stefan Weil
  Cc: Gleb Natapov, Jan Kiszka, qemu-devel, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini, Eric Blake

On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:
> Am 12.09.2012 18:45, schrieb Gleb Natapov:
> >On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> >>Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >>>Hi,
> >>>
> >>>We've been running into a lot of problems lately with Windows guests and
> >>>I think they all ultimately could be addressed by revisiting the missed
> >>>tick catchup algorithms that we use.  Mike and I spent a while talking
> >>>about it yesterday and I wanted to take the discussion to the list to
> >>>get some additional input.
> >>>
> >>>Here are the problems we're seeing:
> >>>
> >>>1) Rapid reinjection can lead to time moving faster for short bursts of
> >>>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>>    that at least one cause is reinjection speed.
> >>>
> >>>2) When hibernating a host system, the guest gets is essentially paused
> >>>    for a long period of time.  This results in a very large tick catchup
> >>>    while also resulting in a large skew in guest time.
> >>>
> >>>    I've gotten reports of the tick catchup consuming a lot of CPU time
> >>>    from rapid delivery of interrupts (although I haven't reproduced this
> >>>    yet).
> >>>
> >>>3) Windows appears to have a service that periodically syncs the guest
> >>>    time with the hardware clock.  I've been told the resync period is an
> >>>    hour.  For large clock skews, this can compete with reinjection
> >>>    resulting in a positive skew in time (the guest can be ahead of the
> >>>    host).
> >>Nearly each modern OS (including Windows) uses NTP
> >>or some other protocol to get the time via a TCP network.
> >>
> >The drifts we are talking about will take ages for NTP to fix.
> >
> >>If a guest OS detects a small difference of time, it will usually
> >>accelerate or decelerate the OS clock until the time is
> >>synchronised again.
> >>
> >>Large jumps in network time will make the OS time jump, too.
> >>With a little bad luck, QEMU's reinjection will add the
> >>positive skew, no matter whether the guest is Linux or Windows.
> >>
> >As far as I know NTP will never make OS clock jump. The purpose of NTP
> >is to fix time gradually, so apps will not notice. npdate is used to
> >force clock synchronization, but is should be run manually.
> 
> s/npdate/ntpdate. Yes, some Linux distros run it at system start,
> and it's also usual to call it every hour (poor man's NTP, uses
> less resources).

Windows at least seems to generally default to a max correction of += 15
hours using this approach. The relevant registry values are listed here:

http://support.microsoft.com/kb/816042#method4 (under More Information)

On my Win7 instance I have:

MaxPosPhaseCorrection: 54000 (15 hours)
MaxNegPhaseCorrection: 54000 (15 hours)

So there are definitely situations where guests won't correct themselves
even with NTP or ntpdate-like services running.

Also:

MaxAllowedPhaseOffset: 1 (1 second)

So Windows won't attempt to "catch-up" via increased tickrate if the
delta is greater than 1 second, and will instead try to reset the clock
directly. Which is basically the policy we're looking to implement,
except from the host-side.


> 
> >
> >>>I've been thinking about an algorithm like this to address these
> >>>problems:
> >>>
> >>>A) Limit the number of interrupts that we reinject to the equivalent of
> >>>    a small period of wallclock time.  Something like 60 seconds.
> >>>
> >>>B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >>>    revisit usage of the in-kernel PIT?
> >>>
> >>>C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>>    force a resync of the guest's time to the hardware wallclock time.
> >>>
> >>>D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>>    accumulated ticks.
> >>D) makes no sense, see my comment above.
> >>
> >>Injection of additional timer interrupts should not be needed
> >>after a hibernation. The guest must handle that situation
> >>by reading either the hw clock (which must be updated
> >>by QEMU when it resumes from hibernate) or by using
> >>another time reference (like NTP, for example).
> >>
> >He is talking about host hibernation, not guest.
> >
> 
> I also meant host hibernation.
> 
> Maybe the host should tell the guest that it is going to
> hibernate (ACPI event), then the guest can use its
> normal hibernate entry and recovery code, too.
> 

I think do that would be useful either way, but aren't there other
scenarios where big time jumps can occur? What about live migration?
Presumably we'd complete within the 15 hour limit above, but for other
operating systems or particular configurations thereof we might still
fall outside the threshold they're willing to correct for. At least with
an approach like this we can clearly define the requirements for proper
time-keeping.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 18:03 Clemens Kolbitsch
@ 2012-09-13  6:25 ` Paolo Bonzini
  0 siblings, 0 replies; 48+ messages in thread
From: Paolo Bonzini @ 2012-09-13  6:25 UTC (permalink / raw)
  To: qemu-devel

Il 12/09/2012 20:03, Clemens Kolbitsch ha scritto:
> 
> not much that I can contribute to solving the problem, but I have a
> bunch of VMs where this happens _every_ time I resume a snapshot (but
> without hibernating). In case this could be a connected problem and
> you need help testing a patch, I'm more than happy to help.

Resuming from a snapshot is the same as hibernating from the guest POV.
 To fix it, you need to suspend the VM to S3 before saving a snapshot
and/or hibernating.

Paolo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 18:19   ` Anthony Liguori
@ 2012-09-13 10:49     ` Gleb Natapov
  2012-09-13 13:14       ` Eric Blake
  0 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2012-09-13 10:49 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael Roth, Jan Kiszka, qemu-devel, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

On Wed, Sep 12, 2012 at 01:19:17PM -0500, Anthony Liguori wrote:
> Gleb Natapov <gleb@redhat.com> writes:
> 
> > On Wed, Sep 12, 2012 at 08:54:26AM -0500, Anthony Liguori wrote:
> >> 
> >> Hi,
> >> 
> >> We've been running into a lot of problems lately with Windows guests and
> >> I think they all ultimately could be addressed by revisiting the missed
> >> tick catchup algorithms that we use.  Mike and I spent a while talking
> >> about it yesterday and I wanted to take the discussion to the list to
> >> get some additional input.
> >> 
> >> Here are the problems we're seeing:
> >> 
> >> 1) Rapid reinjection can lead to time moving faster for short bursts of
> >>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>    that at least one cause is reinjection speed.
> >> 
> >> 2) When hibernating a host system, the guest gets is essentially paused
> >>    for a long period of time.  This results in a very large tick catchup
> >>    while also resulting in a large skew in guest time.
> >> 
> >>    I've gotten reports of the tick catchup consuming a lot of CPU time
> >>    from rapid delivery of interrupts (although I haven't reproduced this
> >>    yet).
> >> 
> >> 3) Windows appears to have a service that periodically syncs the guest
> >>    time with the hardware clock.  I've been told the resync period is an
> >>    hour.  For large clock skews, this can compete with reinjection
> >>    resulting in a positive skew in time (the guest can be ahead of the
> >>    host).
> >> 
> >> I've been thinking about an algorithm like this to address these
> >> problems:
> >> 
> >> A) Limit the number of interrupts that we reinject to the equivalent of
> >>    a small period of wallclock time.  Something like 60 seconds.
> >> 
> > How this will fix BSOD problem for instance? 60 seconds is long enough
> > to cause all the problem you are talking about above. We can make
> > amount of accumulated ticks easily configurable though to play with and
> > see.
> 
> It won't, but the goal of an upper limit is to cap time correction at
> something reasonably caused by overcommit, not by suspend/resume.
> 
> 60 seconds is probably way too long.  Maybe 5 seconds?  We can try
> various amounts as you said.
> 
> What do you think about slowing down the catchup rate?  I think now we
> increase wallclock time by 100-700%.
> 
Now we reinject up to 20 lost tick on guest interrupt acknowledgement
(RTC register C read) and increment frequency like you say if this is
not enough. We do both because on machines without hr timers we cannot
increment frequency if guest sets RTC to 1kHz and injecting a lot of RTC
interrupts at once makes Windows think that RTC irq line is stuck -> BSOD

> This is very fast.  I wonder if this makes sense anymore since hr timers
> are pretty much ubiquitous.
We can drop reinject on ACK if we do not want to support old kernels.
Frequency increase was arbitrary, we can make is smaller, but we have to
make sure that under load drift will not be stronger than our attempts
to fix it.

> 
> I think we could probably even just increase wallclock time by as little
> as 10-20%.  That should avoid false watchdog alerts but still give us a
> chance to inject enough interrupts.
We can start from 10-20% and check that if coalesced counter still grows
increase that.

> 
> >
> >> B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >>    revisit usage of the in-kernel PIT?
> >> 
> > PIT does not matter for Windows guests.
> >
> >> C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>    force a resync of the guest's time to the hardware wallclock time.
> >> 
> > Needs guest cooperation.
> 
> Yes, hence qemu-ga.  But is there any other choice?  Hibernation can
> cause us to miss an unbounded number of ticks.   Days worth of time.  It
> seems unreasonable to gradually catch up that much time.
timedrift fix was never meant to fix timedrifts from vmstop. This is a
side effect of making RTC use real time clock instead of vm clock. With
RTC using real time clock on resume qemu_timer tries to catch up with
current time and fires timer callback for each lost tick. They are all
coalesced of course since guest has no chance to run between them and
accumulated into coalesced_irq counter. If you configure RTC to use vm
clock you should not see this.

I agree with you of course that qemu-ga is the only sane way to fix time
drift due to vstop, but better to not be in this situation if possible.
See bellow.

> 
> >> D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>    accumulated ticks.
> >>
> >> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> >> discussed a low-impact way of doing this (having a separate dispatch
> >> path for guest agent commands) and I'm confident we could do this for
> >> 1.3.
> >> 
> >> This would mean that management tools would need to consume qemu-ga
> >> through QMP.  Not sure if this is a problem for anyone.
> >> 
> >> I'm not sure whether it's worth trying to support this with the
> >> in-kernel PIT or not either.
> >> 
> >> Are there other issues with reinjection that people are aware of?  Does
> >> anything seem obviously wrong with the above?
> >> 
> > It looks like you are trying to solve only pathologically big timedrift
> > problems. Those do not happen normally.
> 
> They do if you hibernate your laptop.
> 
AFAIK libvirt migrates vm into a file on hibernate. It is better to move to S3
(using qemu-ga) instead and migrate to file only if s3 fails.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-12 19:45         ` Stefan Weil
@ 2012-09-13 10:50           ` Gleb Natapov
  0 siblings, 0 replies; 48+ messages in thread
From: Gleb Natapov @ 2012-09-13 10:50 UTC (permalink / raw)
  To: Stefan Weil
  Cc: Michael Roth, Jan Kiszka, qemu-devel, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini, Eric Blake

On Wed, Sep 12, 2012 at 09:45:49PM +0200, Stefan Weil wrote:
> Am 12.09.2012 20:13, schrieb Gleb Natapov:
> >On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:
> >>I also meant host hibernation.
> >Than I don't see how guest can handle the situation since it has
> >no idea that it was stopped. Qemu has not idea about host hibernation
> >either.
> 
> The guest can compare its internal timers (which stop
> when the host hibernates) with external time references
> (NTP, hw clock or any other clock reference).
Are you prosing to run ntpdate in a loop inside a guest?

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 10:49     ` Gleb Natapov
@ 2012-09-13 13:14       ` Eric Blake
  2012-09-13 13:28         ` Daniel P. Berrange
  2012-09-13 13:47         ` Gleb Natapov
  0 siblings, 2 replies; 48+ messages in thread
From: Eric Blake @ 2012-09-13 13:14 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]

On 09/13/2012 04:49 AM, Gleb Natapov wrote:
>> They do if you hibernate your laptop.
>>
> AFAIK libvirt migrates vm into a file on hibernate. It is better to move to S3
> (using qemu-ga) instead and migrate to file only if s3 fails.

On host hibernate, libvirt currently does nothing to the guest.  When
the host resumes, the guests see a large gap in execution.

Libvirt would need a hook into host hibernation, to have enough time to
tell the guests to go into S3 prior to allowing the host to go into S3.

On host reboot, libvirt currently saves guests to disk using migrate to
file.  The ideal solution would be to first tell the guest to go into S3
before migrating to file, but the migration to file STILL must occur,
because the host is about to reboot and S3 is not persistent.  S3 is a
better solution than S4, in that S4 requires the guest to have enough
memory (and if it doesn't cooperate, data is lost), but with S3, even if
the guest doesn't cooperate, we can still fall back to migration to file
with the guest only losing time, but not data.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 13:14       ` Eric Blake
@ 2012-09-13 13:28         ` Daniel P. Berrange
  2012-09-13 14:06           ` Anthony Liguori
  2012-09-13 13:47         ` Gleb Natapov
  1 sibling, 1 reply; 48+ messages in thread
From: Daniel P. Berrange @ 2012-09-13 13:28 UTC (permalink / raw)
  To: Eric Blake
  Cc: Michael Roth, Gleb Natapov, Jan Kiszka, qemu-devel,
	Luiz Capitulino, Avi Kivity, Anthony Liguori, Paolo Bonzini

On Thu, Sep 13, 2012 at 07:14:08AM -0600, Eric Blake wrote:
> On 09/13/2012 04:49 AM, Gleb Natapov wrote:
> >> They do if you hibernate your laptop.
> >>
> > AFAIK libvirt migrates vm into a file on hibernate. It is better to move to S3
> > (using qemu-ga) instead and migrate to file only if s3 fails.
> 
> On host hibernate, libvirt currently does nothing to the guest.  When
> the host resumes, the guests see a large gap in execution.
> 
> Libvirt would need a hook into host hibernation, to have enough time to
> tell the guests to go into S3 prior to allowing the host to go into S3.
> 
> On host reboot, libvirt currently saves guests to disk using migrate to
> file.  The ideal solution would be to first tell the guest to go into S3
> before migrating to file, but the migration to file STILL must occur,
> because the host is about to reboot and S3 is not persistent.  S3 is a
> better solution than S4, in that S4 requires the guest to have enough
> memory (and if it doesn't cooperate, data is lost), but with S3, even if
> the guest doesn't cooperate, we can still fall back to migration to file
> with the guest only losing time, but not data.

Trying to hook into host S3/S4 and do magic to the guests is just
asking for trouble. Not only can it arbitrarily delay the host going
into S3/S4, but it is not reliable in general, even for OS which do
support it. Much better off hooking into the resume path on the host
and issuing a QEMU GA call to each running guest to resync their
clocks

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 13:14       ` Eric Blake
  2012-09-13 13:28         ` Daniel P. Berrange
@ 2012-09-13 13:47         ` Gleb Natapov
  1 sibling, 0 replies; 48+ messages in thread
From: Gleb Natapov @ 2012-09-13 13:47 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Anthony Liguori, Paolo Bonzini

On Thu, Sep 13, 2012 at 07:14:08AM -0600, Eric Blake wrote:
> On 09/13/2012 04:49 AM, Gleb Natapov wrote:
> >> They do if you hibernate your laptop.
> >>
> > AFAIK libvirt migrates vm into a file on hibernate. It is better to move to S3
> > (using qemu-ga) instead and migrate to file only if s3 fails.
> 
> On host hibernate, libvirt currently does nothing to the guest.  When
> the host resumes, the guests see a large gap in execution.
> 
> Libvirt would need a hook into host hibernation, to have enough time to
> tell the guests to go into S3 prior to allowing the host to go into S3.
> 
> On host reboot, libvirt currently saves guests to disk using migrate to
> file.  The ideal solution would be to first tell the guest to go into S3
> before migrating to file, but the migration to file STILL must occur,
> because the host is about to reboot and S3 is not persistent.  S3 is a
> better solution than S4, in that S4 requires the guest to have enough
> memory (and if it doesn't cooperate, data is lost), but with S3, even if
> the guest doesn't cooperate, we can still fall back to migration to file
> with the guest only losing time, but not data.
> 
Correct, after S3 libvirt needs to migrate to file. So my AFAIK was
incorrect. Is it possible to hook into host hibernation?

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 13:28         ` Daniel P. Berrange
@ 2012-09-13 14:06           ` Anthony Liguori
  2012-09-13 14:22             ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2012-09-13 14:06 UTC (permalink / raw)
  To: Daniel P. Berrange, Eric Blake
  Cc: Michael Roth, Gleb Natapov, Jan Kiszka, qemu-devel,
	Luiz Capitulino, Avi Kivity, Paolo Bonzini

"Daniel P. Berrange" <berrange@redhat.com> writes:

> On Thu, Sep 13, 2012 at 07:14:08AM -0600, Eric Blake wrote:
>> On 09/13/2012 04:49 AM, Gleb Natapov wrote:
>> >> They do if you hibernate your laptop.
>> >>
>> > AFAIK libvirt migrates vm into a file on hibernate. It is better to move to S3
>> > (using qemu-ga) instead and migrate to file only if s3 fails.
>> 
>> On host hibernate, libvirt currently does nothing to the guest.  When
>> the host resumes, the guests see a large gap in execution.
>> 
>> Libvirt would need a hook into host hibernation, to have enough time to
>> tell the guests to go into S3 prior to allowing the host to go into S3.
>> 
>> On host reboot, libvirt currently saves guests to disk using migrate to
>> file.  The ideal solution would be to first tell the guest to go into S3
>> before migrating to file, but the migration to file STILL must occur,
>> because the host is about to reboot and S3 is not persistent.  S3 is a
>> better solution than S4, in that S4 requires the guest to have enough
>> memory (and if it doesn't cooperate, data is lost), but with S3, even if
>> the guest doesn't cooperate, we can still fall back to migration to file
>> with the guest only losing time, but not data.
>
> Trying to hook into host S3/S4 and do magic to the guests is just
> asking for trouble. Not only can it arbitrarily delay the host going
> into S3/S4, but it is not reliable in general, even for OS which do
> support it. Much better off hooking into the resume path on the host
> and issuing a QEMU GA call to each running guest to resync their
> clocks

I think it's better for QEMU to talk to qemu-ga.  We can tell when a large
period of time has passed in QEMU because we'll accumulate a large
number of missed ticks.

This could happen because of stop, host suspend, live migration to a
file, etc.

It's much easier for us to call into qemu-ga to do the time correction
whenever this event occurs than to try and have libvirt figure out when
it's necessary.

We know exactly when it's necessary, libvirt would need to guess.

Yes, we could generate a QMP event when a large skew was dedicated, but
I think this could happen often enough that it would be problematic.
Since QEMU is already implementing policy doing timer catchup in the
first place, I think we probably should own time catchup policy entirely.

Regards,

Anthony Liguori

>
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 14:06           ` Anthony Liguori
@ 2012-09-13 14:22             ` Gleb Natapov
  2012-09-13 14:34               ` Avi Kivity
  2012-09-13 14:35               ` Anthony Liguori
  0 siblings, 2 replies; 48+ messages in thread
From: Gleb Natapov @ 2012-09-13 14:22 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
> "Daniel P. Berrange" <berrange@redhat.com> writes:
> 
> > On Thu, Sep 13, 2012 at 07:14:08AM -0600, Eric Blake wrote:
> >> On 09/13/2012 04:49 AM, Gleb Natapov wrote:
> >> >> They do if you hibernate your laptop.
> >> >>
> >> > AFAIK libvirt migrates vm into a file on hibernate. It is better to move to S3
> >> > (using qemu-ga) instead and migrate to file only if s3 fails.
> >> 
> >> On host hibernate, libvirt currently does nothing to the guest.  When
> >> the host resumes, the guests see a large gap in execution.
> >> 
> >> Libvirt would need a hook into host hibernation, to have enough time to
> >> tell the guests to go into S3 prior to allowing the host to go into S3.
> >> 
> >> On host reboot, libvirt currently saves guests to disk using migrate to
> >> file.  The ideal solution would be to first tell the guest to go into S3
> >> before migrating to file, but the migration to file STILL must occur,
> >> because the host is about to reboot and S3 is not persistent.  S3 is a
> >> better solution than S4, in that S4 requires the guest to have enough
> >> memory (and if it doesn't cooperate, data is lost), but with S3, even if
> >> the guest doesn't cooperate, we can still fall back to migration to file
> >> with the guest only losing time, but not data.
> >
> > Trying to hook into host S3/S4 and do magic to the guests is just
> > asking for trouble. Not only can it arbitrarily delay the host going
> > into S3/S4, but it is not reliable in general, even for OS which do
> > support it. Much better off hooking into the resume path on the host
> > and issuing a QEMU GA call to each running guest to resync their
> > clocks
> 
> I think it's better for QEMU to talk to qemu-ga.  We can tell when a large
> period of time has passed in QEMU because we'll accumulate a large
> number of missed ticks.
> 
With RTC configured to use vm clock we will not.

> This could happen because of stop, host suspend, live migration to a
> file, etc.
> 
> It's much easier for us to call into qemu-ga to do the time correction
> whenever this event occurs than to try and have libvirt figure out when
> it's necessary.
And if guest does not have qemu-ga what is better inject interrupts like
crazy for next 2 minutes or leave guest with incorrect time?

> 
> We know exactly when it's necessary, libvirt would need to guess.
> 
> Yes, we could generate a QMP event when a large skew was dedicated, but
> I think this could happen often enough that it would be problematic.
> Since QEMU is already implementing policy doing timer catchup in the
> first place, I think we probably should own time catchup policy entirely.
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> > Regards,
> > Daniel
> > -- 
> > |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> > |: http://libvirt.org              -o-             http://virt-manager.org :|
> > |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> > |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 14:22             ` Gleb Natapov
@ 2012-09-13 14:34               ` Avi Kivity
  2012-09-13 14:42                 ` Eric Blake
  2012-09-13 14:35               ` Anthony Liguori
  1 sibling, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2012-09-13 14:34 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jan Kiszka, Michael Roth, qemu-devel, Anthony Liguori,
	Paolo Bonzini, Luiz Capitulino, Eric Blake

On 09/13/2012 05:22 PM, Gleb Natapov wrote:
>> 
>> It's much easier for us to call into qemu-ga to do the time correction
>> whenever this event occurs than to try and have libvirt figure out when
>> it's necessary.

> And if guest does not have qemu-ga what is better inject interrupts like
> crazy for next 2 minutes or leave guest with incorrect time?

We can try to S3 and resume the guest.  The guest will assume that it
has slept for an unknown amount of time and resync from the RTC.

This may not work for really old server oriented guests.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 14:22             ` Gleb Natapov
  2012-09-13 14:34               ` Avi Kivity
@ 2012-09-13 14:35               ` Anthony Liguori
  2012-09-13 14:48                 ` Gleb Natapov
  1 sibling, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2012-09-13 14:35 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

Gleb Natapov <gleb@redhat.com> writes:

> On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
>> "Daniel P. Berrange" <berrange@redhat.com> writes:
>> 
>> I think it's better for QEMU to talk to qemu-ga.  We can tell when a large
>> period of time has passed in QEMU because we'll accumulate a large
>> number of missed ticks.
>> 
> With RTC configured to use vm clock we will not.

Not for host suspend.  For stop and live migration, we stop vm_clock.
But QEMU isn't aware of host suspend so vm_clock cannot be stopped.

>> This could happen because of stop, host suspend, live migration to a
>> file, etc.
>> 
>> It's much easier for us to call into qemu-ga to do the time correction
>> whenever this event occurs than to try and have libvirt figure out when
>> it's necessary.
> And if guest does not have qemu-ga what is better inject interrupts like
> crazy for next 2 minutes or leave guest with incorrect time?

Yes, at least that's fixable by the end-user.  QEMU consuming 100% CPU
for a prolonged period of time isn't fixable.

Regards,

Anthony Liguori


>
>> 
>> We know exactly when it's necessary, libvirt would need to guess.
>> 
>> Yes, we could generate a QMP event when a large skew was dedicated, but
>> I think this could happen often enough that it would be problematic.
>> Since QEMU is already implementing policy doing timer catchup in the
>> first place, I think we probably should own time catchup policy entirely.
>> 
>> Regards,
>> 
>> Anthony Liguori
>> 
>> >
>> > Regards,
>> > Daniel
>> > -- 
>> > |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
>> > |: http://libvirt.org              -o-             http://virt-manager.org :|
>> > |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
>> > |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
>
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 14:34               ` Avi Kivity
@ 2012-09-13 14:42                 ` Eric Blake
  2012-09-13 15:40                   ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2012-09-13 14:42 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, qemu-devel,
	Anthony Liguori, Paolo Bonzini, Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 1408 bytes --]

On 09/13/2012 08:34 AM, Avi Kivity wrote:
> On 09/13/2012 05:22 PM, Gleb Natapov wrote:
>>>
>>> It's much easier for us to call into qemu-ga to do the time correction
>>> whenever this event occurs than to try and have libvirt figure out when
>>> it's necessary.
> 
>> And if guest does not have qemu-ga what is better inject interrupts like
>> crazy for next 2 minutes or leave guest with incorrect time?
> 
> We can try to S3 and resume the guest.  The guest will assume that it
> has slept for an unknown amount of time and resync from the RTC.

Just restating to make sure I'm clear: you are proposing that after host
suspend and host wakeup, _then_ qemu asks the guest to go into S3
followed by an immediate resume.  During the time that the guest goes
into S3, its clock is way off; but then the immediate resume will be the
necessary kick to the guest to resync, and at that point, the guest can
assume that it has been off since the host originally suspended (rather
than the instant window where qemu actually bounced S3 after the host
had already resumed).

> This may not work for really old server oriented guests.

S3 requires guest cooperation, period.  But so does qemu-ga.  It's
better than nothing, and we can't get perfection without guest cooperation.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 14:35               ` Anthony Liguori
@ 2012-09-13 14:48                 ` Gleb Natapov
  2012-09-13 15:51                   ` Avi Kivity
  2012-09-13 15:56                   ` Anthony Liguori
  0 siblings, 2 replies; 48+ messages in thread
From: Gleb Natapov @ 2012-09-13 14:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

On Thu, Sep 13, 2012 at 09:35:18AM -0500, Anthony Liguori wrote:
> Gleb Natapov <gleb@redhat.com> writes:
> 
> > On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
> >> "Daniel P. Berrange" <berrange@redhat.com> writes:
> >> 
> >> I think it's better for QEMU to talk to qemu-ga.  We can tell when a large
> >> period of time has passed in QEMU because we'll accumulate a large
> >> number of missed ticks.
> >> 
> > With RTC configured to use vm clock we will not.
> 
> Not for host suspend.  For stop and live migration, we stop vm_clock.
> But QEMU isn't aware of host suspend so vm_clock cannot be stopped.
> 
Hmm, true. What about hooking into suspend and doing vmstop during
suspend. 

> >> This could happen because of stop, host suspend, live migration to a
> >> file, etc.
> >> 
> >> It's much easier for us to call into qemu-ga to do the time correction
> >> whenever this event occurs than to try and have libvirt figure out when
> >> it's necessary.
> > And if guest does not have qemu-ga what is better inject interrupts like
> > crazy for next 2 minutes or leave guest with incorrect time?
> 
> Yes, at least that's fixable by the end-user.  QEMU consuming 100% CPU
> for a prolonged period of time isn't fixable.
> 
You mean yes to "leave guest with incorrect time"? QEMU will still
consume 100% of cpu for some time calling qemu_timer callback millions
times. timedrift code is not the right level to fix that.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 14:42                 ` Eric Blake
@ 2012-09-13 15:40                   ` Avi Kivity
  2012-09-13 15:50                     ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2012-09-13 15:40 UTC (permalink / raw)
  To: Eric Blake
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, qemu-devel,
	Anthony Liguori, Paolo Bonzini, Luiz Capitulino

On 09/13/2012 05:42 PM, Eric Blake wrote:
> On 09/13/2012 08:34 AM, Avi Kivity wrote:
>> On 09/13/2012 05:22 PM, Gleb Natapov wrote:
>>>>
>>>> It's much easier for us to call into qemu-ga to do the time correction
>>>> whenever this event occurs than to try and have libvirt figure out when
>>>> it's necessary.
>> 
>>> And if guest does not have qemu-ga what is better inject interrupts like
>>> crazy for next 2 minutes or leave guest with incorrect time?
>> 
>> We can try to S3 and resume the guest.  The guest will assume that it
>> has slept for an unknown amount of time and resync from the RTC.
> 
> Just restating to make sure I'm clear: you are proposing that after host
> suspend and host wakeup, _then_ qemu asks the guest to go into S3
> followed by an immediate resume.  During the time that the guest goes
> into S3, its clock is way off; but then the immediate resume will be the
> necessary kick to the guest to resync, and at that point, the guest can
> assume that it has been off since the host originally suspended (rather
> than the instant window where qemu actually bounced S3 after the host
> had already resumed).

Correct.  In theory we can shut down networking while this is happening
so the guest can't tell it's in a time machine.

> 
>> This may not work for really old server oriented guests.
> 
> S3 requires guest cooperation, period.  But so does qemu-ga.  It's
> better than nothing, and we can't get perfection without guest cooperation.

qemu-ga requires either an admin/tool to install qemu-ga, or for qemu-ga
to be preinstalled by the OS vendor.  S3 requires S3 support to be
provided by the host vendor, and for it to be functional.  A
significantly easier bar to clear.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 15:40                   ` Avi Kivity
@ 2012-09-13 15:50                     ` Anthony Liguori
  2012-09-13 15:53                       ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2012-09-13 15:50 UTC (permalink / raw)
  To: Avi Kivity, Eric Blake
  Cc: Michael Roth, Gleb Natapov, Jan Kiszka, qemu-devel,
	Luiz Capitulino, Paolo Bonzini

Avi Kivity <avi@redhat.com> writes:

> On 09/13/2012 05:42 PM, Eric Blake wrote:
>> On 09/13/2012 08:34 AM, Avi Kivity wrote:
>>> On 09/13/2012 05:22 PM, Gleb Natapov wrote:
>>>>>
>>>>> It's much easier for us to call into qemu-ga to do the time correction
>>>>> whenever this event occurs than to try and have libvirt figure out when
>>>>> it's necessary.
>>> 
>>>> And if guest does not have qemu-ga what is better inject interrupts like
>>>> crazy for next 2 minutes or leave guest with incorrect time?
>>> 
>>> We can try to S3 and resume the guest.  The guest will assume that it
>>> has slept for an unknown amount of time and resync from the RTC.
>> 
>> Just restating to make sure I'm clear: you are proposing that after host
>> suspend and host wakeup, _then_ qemu asks the guest to go into S3
>> followed by an immediate resume.  During the time that the guest goes
>> into S3, its clock is way off; but then the immediate resume will be the
>> necessary kick to the guest to resync, and at that point, the guest can
>> assume that it has been off since the host originally suspended (rather
>> than the instant window where qemu actually bounced S3 after the host
>> had already resumed).
>
> Correct.  In theory we can shut down networking while this is happening
> so the guest can't tell it's in a time machine.
>
>> 
>>> This may not work for really old server oriented guests.
>> 
>> S3 requires guest cooperation, period.  But so does qemu-ga.  It's
>> better than nothing, and we can't get perfection without guest cooperation.
>
> qemu-ga requires either an admin/tool to install qemu-ga, or for qemu-ga
> to be preinstalled by the OS vendor.  S3 requires S3 support to be
> provided by the host vendor, and for it to be functional.  A
> significantly easier bar to clear.

We can easily generate an ISO that includes a pre-built version of
qemu-ga.

Plus, there's a whole variety of other features enabled once we can
assume qemu-ga is available.  It's worth solving that problem.

Regards,

Anthony Liguori

>
>
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 14:48                 ` Gleb Natapov
@ 2012-09-13 15:51                   ` Avi Kivity
  2012-09-13 15:56                   ` Anthony Liguori
  1 sibling, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2012-09-13 15:51 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jan Kiszka, Michael Roth, qemu-devel, Anthony Liguori,
	Paolo Bonzini, Luiz Capitulino, Eric Blake

On 09/13/2012 05:48 PM, Gleb Natapov wrote:
> On Thu, Sep 13, 2012 at 09:35:18AM -0500, Anthony Liguori wrote:
>> Gleb Natapov <gleb@redhat.com> writes:
>> 
>> > On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
>> >> "Daniel P. Berrange" <berrange@redhat.com> writes:
>> >> 
>> >> I think it's better for QEMU to talk to qemu-ga.  We can tell when a large
>> >> period of time has passed in QEMU because we'll accumulate a large
>> >> number of missed ticks.
>> >> 
>> > With RTC configured to use vm clock we will not.
>> 
>> Not for host suspend.  For stop and live migration, we stop vm_clock.
>> But QEMU isn't aware of host suspend so vm_clock cannot be stopped.
>> 
> Hmm, true. What about hooking into suspend and doing vmstop during
> suspend. 

There is a DBus API (UPower) with a method called AboutToSleep().  No
idea what it does or if unprivileged processes can access it.  libvirt
of qemud could do it, of course.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 15:50                     ` Anthony Liguori
@ 2012-09-13 15:53                       ` Avi Kivity
  2012-09-13 18:27                         ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2012-09-13 15:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, qemu-devel, Paolo Bonzini,
	Luiz Capitulino, Eric Blake

On 09/13/2012 06:50 PM, Anthony Liguori wrote:
>>> 
>>>> This may not work for really old server oriented guests.
>>> 
>>> S3 requires guest cooperation, period.  But so does qemu-ga.  It's
>>> better than nothing, and we can't get perfection without guest cooperation.
>>
>> qemu-ga requires either an admin/tool to install qemu-ga, or for qemu-ga
>> to be preinstalled by the OS vendor.  S3 requires S3 support to be
>> provided by the host vendor, and for it to be functional.  A
>> significantly easier bar to clear.
> 
> We can easily generate an ISO that includes a pre-built version of
> qemu-ga.

We could easily generate water, but we can't make the horse drink it.

> 
> Plus, there's a whole variety of other features enabled once we can
> assume qemu-ga is available.  It's worth solving that problem.

We can't assume it.  Too many OSes exist, too many guests are already
exist and ain't broken, too many vendors are moving into a locked-down
model.  I agree it's great and we should take advantage of it, but we
can't assume it's there.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 14:48                 ` Gleb Natapov
  2012-09-13 15:51                   ` Avi Kivity
@ 2012-09-13 15:56                   ` Anthony Liguori
  2012-09-13 16:06                     ` Gleb Natapov
  2012-09-13 16:08                     ` Avi Kivity
  1 sibling, 2 replies; 48+ messages in thread
From: Anthony Liguori @ 2012-09-13 15:56 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

Gleb Natapov <gleb@redhat.com> writes:

> On Thu, Sep 13, 2012 at 09:35:18AM -0500, Anthony Liguori wrote:
>> Gleb Natapov <gleb@redhat.com> writes:
>> 
>> > On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
>> >> "Daniel P. Berrange" <berrange@redhat.com> writes:
>> >> 
>> >> I think it's better for QEMU to talk to qemu-ga.  We can tell when a large
>> >> period of time has passed in QEMU because we'll accumulate a large
>> >> number of missed ticks.
>> >> 
>> > With RTC configured to use vm clock we will not.
>> 
>> Not for host suspend.  For stop and live migration, we stop vm_clock.
>> But QEMU isn't aware of host suspend so vm_clock cannot be stopped.
>> 
> Hmm, true. What about hooking into suspend and doing vmstop during
> suspend. 

Is suspend the only foreseeable way for this problem to happen?  I don't
think it is which is what concerns me about any approach that relies on
"hooking suspend".

Also, I don't think there is a generic way to "hook suspend".

>> >> This could happen because of stop, host suspend, live migration to a
>> >> file, etc.
>> >> 
>> >> It's much easier for us to call into qemu-ga to do the time correction
>> >> whenever this event occurs than to try and have libvirt figure out when
>> >> it's necessary.
>> > And if guest does not have qemu-ga what is better inject interrupts like
>> > crazy for next 2 minutes or leave guest with incorrect time?
>> 
>> Yes, at least that's fixable by the end-user.  QEMU consuming 100% CPU
>> for a prolonged period of time isn't fixable.
>> 
> You mean yes to "leave guest with incorrect time"? QEMU will still
> consume 100% of cpu for some time calling qemu_timer callback millions
> times. timedrift code is not the right level to fix that.

Not if we put a cap on how many interrupts we'll try to catch up.

As I mentioned previously, if we acrue more than X number of missed
ticks, we should simply declare bankruptcy and reset the counter.

When that occurs, *if* qemu-ga is present, we should ask qemu-ga to
reset the guest's clock based on reading the hardware clock via a
'guest-resync-time' command.

If it isn't, time will be off.  Hopefully the guest is running NTP and
can correct itself.  Otherwise, at least the admin can manually fix the
time.

Regards,

Anthony Liguori

>
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 15:56                   ` Anthony Liguori
@ 2012-09-13 16:06                     ` Gleb Natapov
  2012-09-13 18:33                       ` Anthony Liguori
  2012-09-13 16:08                     ` Avi Kivity
  1 sibling, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2012-09-13 16:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

On Thu, Sep 13, 2012 at 10:56:56AM -0500, Anthony Liguori wrote:
> Gleb Natapov <gleb@redhat.com> writes:
> 
> > On Thu, Sep 13, 2012 at 09:35:18AM -0500, Anthony Liguori wrote:
> >> Gleb Natapov <gleb@redhat.com> writes:
> >> 
> >> > On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
> >> >> "Daniel P. Berrange" <berrange@redhat.com> writes:
> >> >> 
> >> >> I think it's better for QEMU to talk to qemu-ga.  We can tell when a large
> >> >> period of time has passed in QEMU because we'll accumulate a large
> >> >> number of missed ticks.
> >> >> 
> >> > With RTC configured to use vm clock we will not.
> >> 
> >> Not for host suspend.  For stop and live migration, we stop vm_clock.
> >> But QEMU isn't aware of host suspend so vm_clock cannot be stopped.
> >> 
> > Hmm, true. What about hooking into suspend and doing vmstop during
> > suspend. 
> 
> Is suspend the only foreseeable way for this problem to happen?  I don't
> think it is which is what concerns me about any approach that relies on
> "hooking suspend".
> 
With RTC using real time clock setting host time far ahead of what is it
will trigger same behaviour I think.

> Also, I don't think there is a generic way to "hook suspend".
> 
> >> >> This could happen because of stop, host suspend, live migration to a
> >> >> file, etc.
> >> >> 
> >> >> It's much easier for us to call into qemu-ga to do the time correction
> >> >> whenever this event occurs than to try and have libvirt figure out when
> >> >> it's necessary.
> >> > And if guest does not have qemu-ga what is better inject interrupts like
> >> > crazy for next 2 minutes or leave guest with incorrect time?
> >> 
> >> Yes, at least that's fixable by the end-user.  QEMU consuming 100% CPU
> >> for a prolonged period of time isn't fixable.
> >> 
> > You mean yes to "leave guest with incorrect time"? QEMU will still
> > consume 100% of cpu for some time calling qemu_timer callback millions
> > times. timedrift code is not the right level to fix that.
> 
> Not if we put a cap on how many interrupts we'll try to catch up.
> 
Interrupts ctachup happens at another level. If guest was stopped for
24 hours while RTC was configured to 1kHz qemu_timer will fire callback
88473600 times. Each invocation will try to inject interrupt and fail
incrementing coalesced_irq instead. You can cap coalesced_irq but
callback will still fire 88473600 times.

> As I mentioned previously, if we acrue more than X number of missed
> ticks, we should simply declare bankruptcy and reset the counter.
> 
> When that occurs, *if* qemu-ga is present, we should ask qemu-ga to
> reset the guest's clock based on reading the hardware clock via a
> 'guest-resync-time' command.
> 
> If it isn't, time will be off.  Hopefully the guest is running NTP and
> can correct itself.  Otherwise, at least the admin can manually fix the
> time.
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 15:56                   ` Anthony Liguori
  2012-09-13 16:06                     ` Gleb Natapov
@ 2012-09-13 16:08                     ` Avi Kivity
  1 sibling, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2012-09-13 16:08 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, qemu-devel, Paolo Bonzini,
	Luiz Capitulino, Eric Blake

On 09/13/2012 06:56 PM, Anthony Liguori wrote:
>>> 
>> Hmm, true. What about hooking into suspend and doing vmstop during
>> suspend. 
> 
> Is suspend the only foreseeable way for this problem to happen?  I don't
> think it is which is what concerns me about any approach that relies on
> "hooking suspend".

No, SIGSTOP/SIGCONT (can hook SIGCONT), gdb (can't hook but is very
rare), ENOSPACE + wait for more space to be provisioned (already known
to qemu), NFS access qemu core on dead server, severe swapstorms.

> Also, I don't think there is a generic way to "hook suspend".

That is what we have Lennart for.

>>> >> This could happen because of stop, host suspend, live migration to a
>>> >> file, etc.
>>> >> 
>>> >> It's much easier for us to call into qemu-ga to do the time correction
>>> >> whenever this event occurs than to try and have libvirt figure out when
>>> >> it's necessary.
>>> > And if guest does not have qemu-ga what is better inject interrupts like
>>> > crazy for next 2 minutes or leave guest with incorrect time?
>>> 
>>> Yes, at least that's fixable by the end-user.  QEMU consuming 100% CPU
>>> for a prolonged period of time isn't fixable.
>>> 
>> You mean yes to "leave guest with incorrect time"? QEMU will still
>> consume 100% of cpu for some time calling qemu_timer callback millions
>> times. timedrift code is not the right level to fix that.
> 
> Not if we put a cap on how many interrupts we'll try to catch up.
> 
> As I mentioned previously, if we acrue more than X number of missed
> ticks, we should simply declare bankruptcy and reset the counter.

If we know we're missing N ticks, we can simply pass N to the handler.

> 
> When that occurs, *if* qemu-ga is present, we should ask qemu-ga to
> reset the guest's clock based on reading the hardware clock via a
> 'guest-resync-time' command.
> 
> If it isn't, time will be off.  Hopefully the guest is running NTP and
> can correct itself.  Otherwise, at least the admin can manually fix the
> time.

There is also the fake S3 (post host resume) that can get the guest to
read its RTC.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 15:53                       ` Avi Kivity
@ 2012-09-13 18:27                         ` Anthony Liguori
  2012-09-16 10:05                           ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2012-09-13 18:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, qemu-devel, Paolo Bonzini,
	Luiz Capitulino, Eric Blake

Avi Kivity <avi@redhat.com> writes:

> On 09/13/2012 06:50 PM, Anthony Liguori wrote:
>>>> 
>>>>> This may not work for really old server oriented guests.
>>>> 
>>>> S3 requires guest cooperation, period.  But so does qemu-ga.  It's
>>>> better than nothing, and we can't get perfection without guest cooperation.
>>>
>>> qemu-ga requires either an admin/tool to install qemu-ga, or for qemu-ga
>>> to be preinstalled by the OS vendor.  S3 requires S3 support to be
>>> provided by the host vendor, and for it to be functional.  A
>>> significantly easier bar to clear.
>> 
>> We can easily generate an ISO that includes a pre-built version of
>> qemu-ga.
>
> We could easily generate water, but we can't make the horse drink it.
>
>> 
>> Plus, there's a whole variety of other features enabled once we can
>> assume qemu-ga is available.  It's worth solving that problem.
>
> We can't assume it.  Too many OSes exist, too many guests are already
> exist and ain't broken, too many vendors are moving into a locked-down
> model.  I agree it's great and we should take advantage of it, but we
> can't assume it's there.

All the same can be said about virtio yet we still add features that
depend on it.

If there was a better/equivalent solution that didn't depend on qemu-ga,
I'd be all for it.  But there isn't AFAICT.

Regards,

Anthony Liguori

>
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 16:06                     ` Gleb Natapov
@ 2012-09-13 18:33                       ` Anthony Liguori
  2012-09-13 18:56                         ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2012-09-13 18:33 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

Gleb Natapov <gleb@redhat.com> writes:

> On Thu, Sep 13, 2012 at 10:56:56AM -0500, Anthony Liguori wrote:
>> Gleb Natapov <gleb@redhat.com> writes:
>> 
>> > On Thu, Sep 13, 2012 at 09:35:18AM -0500, Anthony Liguori wrote:
>> >> Gleb Natapov <gleb@redhat.com> writes:
>> >> 
>> >> > On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
>> >> >> "Daniel P. Berrange" <berrange@redhat.com> writes:
>> >> >> 
>> >> >> I think it's better for QEMU to talk to qemu-ga.  We can tell when a large
>> >> >> period of time has passed in QEMU because we'll accumulate a large
>> >> >> number of missed ticks.
>> >> >> 
>> >> > With RTC configured to use vm clock we will not.
>> >> 
>> >> Not for host suspend.  For stop and live migration, we stop vm_clock.
>> >> But QEMU isn't aware of host suspend so vm_clock cannot be stopped.
>> >> 
>> > Hmm, true. What about hooking into suspend and doing vmstop during
>> > suspend. 
>> 
>> Is suspend the only foreseeable way for this problem to happen?  I don't
>> think it is which is what concerns me about any approach that relies on
>> "hooking suspend".
>> 
> With RTC using real time clock setting host time far ahead of what is it
> will trigger same behaviour I think.
>
>> Also, I don't think there is a generic way to "hook suspend".
>> 
>> >> >> This could happen because of stop, host suspend, live migration to a
>> >> >> file, etc.
>> >> >> 
>> >> >> It's much easier for us to call into qemu-ga to do the time correction
>> >> >> whenever this event occurs than to try and have libvirt figure out when
>> >> >> it's necessary.
>> >> > And if guest does not have qemu-ga what is better inject interrupts like
>> >> > crazy for next 2 minutes or leave guest with incorrect time?
>> >> 
>> >> Yes, at least that's fixable by the end-user.  QEMU consuming 100% CPU
>> >> for a prolonged period of time isn't fixable.
>> >> 
>> > You mean yes to "leave guest with incorrect time"? QEMU will still
>> > consume 100% of cpu for some time calling qemu_timer callback millions
>> > times. timedrift code is not the right level to fix that.
>> 
>> Not if we put a cap on how many interrupts we'll try to catch up.
>> 
> Interrupts ctachup happens at another level. If guest was stopped for
> 24 hours while RTC was configured to 1kHz qemu_timer will fire callback
> 88473600 times. Each invocation will try to inject interrupt and fail
> incrementing coalesced_irq instead. You can cap coalesced_irq but
> callback will still fire 88473600 times.

That's a bug.

The next period calculation should not be based on the last period +
length of period but rather on the current time + delta to next period
boundary.

IOW, if we shouldn't arm timers to expire backwards in time from when
the event occurred.  That should be accounted as a missed tick.

Regards,

Anthony Liguori

>
>> As I mentioned previously, if we acrue more than X number of missed
>> ticks, we should simply declare bankruptcy and reset the counter.
>> 
>> When that occurs, *if* qemu-ga is present, we should ask qemu-ga to
>> reset the guest's clock based on reading the hardware clock via a
>> 'guest-resync-time' command.
>> 
>> If it isn't, time will be off.  Hopefully the guest is running NTP and
>> can correct itself.  Otherwise, at least the admin can manually fix the
>> time.
>> 
>> Regards,
>> 
>> Anthony Liguori
>> 
>> >
>> > --
>> > 			Gleb.
>
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 18:33                       ` Anthony Liguori
@ 2012-09-13 18:56                         ` Gleb Natapov
  2012-09-13 20:06                           ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2012-09-13 18:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

On Thu, Sep 13, 2012 at 01:33:31PM -0500, Anthony Liguori wrote:
> Gleb Natapov <gleb@redhat.com> writes:
> 
> > On Thu, Sep 13, 2012 at 10:56:56AM -0500, Anthony Liguori wrote:
> >> Gleb Natapov <gleb@redhat.com> writes:
> >> 
> >> > On Thu, Sep 13, 2012 at 09:35:18AM -0500, Anthony Liguori wrote:
> >> >> Gleb Natapov <gleb@redhat.com> writes:
> >> >> 
> >> >> > On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
> >> >> >> "Daniel P. Berrange" <berrange@redhat.com> writes:
> >> >> >> 
> >> >> >> I think it's better for QEMU to talk to qemu-ga.  We can tell when a large
> >> >> >> period of time has passed in QEMU because we'll accumulate a large
> >> >> >> number of missed ticks.
> >> >> >> 
> >> >> > With RTC configured to use vm clock we will not.
> >> >> 
> >> >> Not for host suspend.  For stop and live migration, we stop vm_clock.
> >> >> But QEMU isn't aware of host suspend so vm_clock cannot be stopped.
> >> >> 
> >> > Hmm, true. What about hooking into suspend and doing vmstop during
> >> > suspend. 
> >> 
> >> Is suspend the only foreseeable way for this problem to happen?  I don't
> >> think it is which is what concerns me about any approach that relies on
> >> "hooking suspend".
> >> 
> > With RTC using real time clock setting host time far ahead of what is it
> > will trigger same behaviour I think.
> >
> >> Also, I don't think there is a generic way to "hook suspend".
> >> 
> >> >> >> This could happen because of stop, host suspend, live migration to a
> >> >> >> file, etc.
> >> >> >> 
> >> >> >> It's much easier for us to call into qemu-ga to do the time correction
> >> >> >> whenever this event occurs than to try and have libvirt figure out when
> >> >> >> it's necessary.
> >> >> > And if guest does not have qemu-ga what is better inject interrupts like
> >> >> > crazy for next 2 minutes or leave guest with incorrect time?
> >> >> 
> >> >> Yes, at least that's fixable by the end-user.  QEMU consuming 100% CPU
> >> >> for a prolonged period of time isn't fixable.
> >> >> 
> >> > You mean yes to "leave guest with incorrect time"? QEMU will still
> >> > consume 100% of cpu for some time calling qemu_timer callback millions
> >> > times. timedrift code is not the right level to fix that.
> >> 
> >> Not if we put a cap on how many interrupts we'll try to catch up.
> >> 
> > Interrupts ctachup happens at another level. If guest was stopped for
> > 24 hours while RTC was configured to 1kHz qemu_timer will fire callback
> > 88473600 times. Each invocation will try to inject interrupt and fail
> > incrementing coalesced_irq instead. You can cap coalesced_irq but
> > callback will still fire 88473600 times.
> 
> That's a bug.
> 
> The next period calculation should not be based on the last period +
> length of period but rather on the current time + delta to next period
> boundary.
> 
I disagree that this is a bug. This is by design to account for timer
signals that was delivered to late.


> IOW, if we shouldn't arm timers to expire backwards in time from when
> the event occurred.  That should be accounted as a missed tick.
> 
Not all users of qemu_timer have their own missed tick accounting so
qemu_timer provides general one. We can create another time source
for qemu_timer without this behaviour and use it in RTC.


--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 18:56                         ` Gleb Natapov
@ 2012-09-13 20:06                           ` Anthony Liguori
  0 siblings, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2012-09-13 20:06 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Jan Kiszka, Michael Roth, Luiz Capitulino, Avi Kivity,
	Paolo Bonzini, Eric Blake

Gleb Natapov <gleb@redhat.com> writes:

>> That's a bug.
>> 
>> The next period calculation should not be based on the last period +
>> length of period but rather on the current time + delta to next period
>> boundary.
>> 
> I disagree that this is a bug. This is by design to account for timer
> signals that was delivered to late.

It's immediate reinjection of ticks missed due to timer delay such that
even if you select tdf=slew, you're still doing reinject.

Maybe it's just semantics of whether it's a bug or feature, but
hopefully you agree that making tdf=slew gradually deliver those missed
ticks would be a good thing.

>
>
>> IOW, if we shouldn't arm timers to expire backwards in time from when
>> the event occurred.  That should be accounted as a missed tick.
>> 
> Not all users of qemu_timer have their own missed tick accounting so
> qemu_timer provides general one. We can create another time source
> for qemu_timer without this behaviour and use it in RTC.

I think we probably should start by getting one device to have good
catchup behavior and then we can look at generalizing.

There are other things we've never done that would help too (like
allowing for non-base 10 timer frequencies) that would make some of the
time calculations easier.

But I think at this point, I need to send some patches which requires
some hacking first..

Regards,

Anthony Liguori

>
>
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-13 18:27                         ` Anthony Liguori
@ 2012-09-16 10:05                           ` Avi Kivity
  2012-09-16 14:37                             ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2012-09-16 10:05 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, qemu-devel, Paolo Bonzini,
	Luiz Capitulino, Eric Blake

On 09/13/2012 09:27 PM, Anthony Liguori wrote:
>>> 
>>> Plus, there's a whole variety of other features enabled once we can
>>> assume qemu-ga is available.  It's worth solving that problem.
>>
>> We can't assume it.  Too many OSes exist, too many guests are already
>> exist and ain't broken, too many vendors are moving into a locked-down
>> model.  I agree it's great and we should take advantage of it, but we
>> can't assume it's there.
> 
> All the same can be said about virtio yet we still add features that
> depend on it.

We don't.  We can boot out-of-the-box guests and they will be fully
functioning without virtio, if slow.

> 
> If there was a better/equivalent solution that didn't depend on qemu-ga,
> I'd be all for it.  But there isn't AFAICT.

Perhaps there is.  We fixed the problem for Linux by adding kvmclock and
backporting it to distros that users are most likely to use.  Windows
fixed the problem by adding their own pv clock interface.  So we need to
implement that, then focus on tick catchup for Windows XP and other
guests with no pv interface (*BSD, etc.)

Those older guests are also less likely to have a qemu-ga port or
administrator motivation to install it.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-16 10:05                           ` Avi Kivity
@ 2012-09-16 14:37                             ` Anthony Liguori
  2012-09-19 15:34                               ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2012-09-16 14:37 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, qemu-devel, Paolo Bonzini,
	Luiz Capitulino, Eric Blake

Avi Kivity <avi@redhat.com> writes:

> On 09/13/2012 09:27 PM, Anthony Liguori wrote:
>> If there was a better/equivalent solution that didn't depend on qemu-ga,
>> I'd be all for it.  But there isn't AFAICT.
>
> Perhaps there is.  We fixed the problem for Linux by adding kvmclock and
> backporting it to distros that users are most likely to use.  Windows
> fixed the problem by adding their own pv clock interface.  So we need to
> implement that, then focus on tick catchup for Windows XP and other
> guests with no pv interface (*BSD, etc.)

Tick catchup simply isn't going to work.  That's the whole point of the thread.

>
> Those older guests are also less likely to have a qemu-ga port or
> administrator motivation to install it.

That's a strange assertion to make.  FWIW, the issue with hibernation
was reported to me with a combination of WinXP and Windows 7 guests, in
this case, it's a totally new deployment.  Adding qemu-ga is totally
reasonable.

Regards,

Anthony Liguori

>
>
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-16 14:37                             ` Anthony Liguori
@ 2012-09-19 15:34                               ` Avi Kivity
  2012-09-19 16:37                                 ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2012-09-19 15:34 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, qemu-devel, Paolo Bonzini,
	Luiz Capitulino, Eric Blake

On 09/16/2012 05:37 PM, Anthony Liguori wrote:
> Avi Kivity <avi@redhat.com> writes:
> 
>> On 09/13/2012 09:27 PM, Anthony Liguori wrote:
>>> If there was a better/equivalent solution that didn't depend on qemu-ga,
>>> I'd be all for it.  But there isn't AFAICT.
>>
>> Perhaps there is.  We fixed the problem for Linux by adding kvmclock and
>> backporting it to distros that users are most likely to use.  Windows
>> fixed the problem by adding their own pv clock interface.  So we need to
>> implement that, then focus on tick catchup for Windows XP and other
>> guests with no pv interface (*BSD, etc.)
> 
> Tick catchup simply isn't going to work.  That's the whole point of the thread.

I'll restate.  Windows and Linux don't need either qemu-ga or tick
catchup since they have pv time interfaces.  FreeBSD and less frequently
used guests are unlikely to get a qemu-ga port, so they need tick
catchup.  Is there reason to believe tick catchup won't work on FreeBSD?

>>
>> Those older guests are also less likely to have a qemu-ga port or
>> administrator motivation to install it.
> 
> That's a strange assertion to make.  FWIW, the issue with hibernation
> was reported to me with a combination of WinXP and Windows 7 guests, in
> this case, it's a totally new deployment.  Adding qemu-ga is totally
> reasonable.

Windows 7 doesn't need anything if we implement the pv time interface.
That is less effort than requiring a qemu-ga installation.  Windows XP
is an edge case.  We can of course support qemu-ga for it, or we can
massage the tick code to work with it, since it's timekeeping is likely
a lot less sophisticated than 7's.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-19 15:34                               ` Avi Kivity
@ 2012-09-19 16:37                                 ` Gleb Natapov
  2012-09-19 16:44                                   ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2012-09-19 16:37 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jan Kiszka, Michael Roth, qemu-devel, Anthony Liguori,
	Paolo Bonzini, Luiz Capitulino, Eric Blake

On Wed, Sep 19, 2012 at 06:34:46PM +0300, Avi Kivity wrote:
> On 09/16/2012 05:37 PM, Anthony Liguori wrote:
> > Avi Kivity <avi@redhat.com> writes:
> > 
> >> On 09/13/2012 09:27 PM, Anthony Liguori wrote:
> >>> If there was a better/equivalent solution that didn't depend on qemu-ga,
> >>> I'd be all for it.  But there isn't AFAICT.
> >>
> >> Perhaps there is.  We fixed the problem for Linux by adding kvmclock and
> >> backporting it to distros that users are most likely to use.  Windows
> >> fixed the problem by adding their own pv clock interface.  So we need to
> >> implement that, then focus on tick catchup for Windows XP and other
> >> guests with no pv interface (*BSD, etc.)
> > 
> > Tick catchup simply isn't going to work.  That's the whole point of the thread.
> 
> I'll restate.  Windows and Linux don't need either qemu-ga or tick
> catchup since they have pv time interfaces.  FreeBSD and less frequently
> used guests are unlikely to get a qemu-ga port, so they need tick
> catchup.  Is there reason to believe tick catchup won't work on FreeBSD?
> 
If FreeBSD tries to compensate for lost ticks it may not work.

> >>
> >> Those older guests are also less likely to have a qemu-ga port or
> >> administrator motivation to install it.
> > 
> > That's a strange assertion to make.  FWIW, the issue with hibernation
> > was reported to me with a combination of WinXP and Windows 7 guests, in
> > this case, it's a totally new deployment.  Adding qemu-ga is totally
> > reasonable.
> 
> Windows 7 doesn't need anything if we implement the pv time interface.
What PV interface exactly? According to [1] Hyper-v also tries to
"catch-up" timer by shortening timer period unless to many events were
missed.

[1] http://msdn.microsoft.com/en-us/library/windows/hardware/ff542561%28v=vs.85%29.aspx

> That is less effort than requiring a qemu-ga installation.  Windows XP
> is an edge case.  We can of course support qemu-ga for it, or we can
> massage the tick code to work with it, since it's timekeeping is likely
> a lot less sophisticated than 7's.
> 
How do you propose to "massage the tick code" to compensate for 100
hours of missed ticks in a sane way? As far as I know there is no
difference in timekeeping between Windows XP and Windows 7 (at least
without PV).

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-19 16:37                                 ` Gleb Natapov
@ 2012-09-19 16:44                                   ` Avi Kivity
  2012-09-19 16:55                                     ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2012-09-19 16:44 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jan Kiszka, Michael Roth, qemu-devel, Anthony Liguori,
	Paolo Bonzini, Luiz Capitulino, Eric Blake

On 09/19/2012 07:37 PM, Gleb Natapov wrote:
> On Wed, Sep 19, 2012 at 06:34:46PM +0300, Avi Kivity wrote:
>> On 09/16/2012 05:37 PM, Anthony Liguori wrote:
>> > Avi Kivity <avi@redhat.com> writes:
>> > 
>> >> On 09/13/2012 09:27 PM, Anthony Liguori wrote:
>> >>> If there was a better/equivalent solution that didn't depend on qemu-ga,
>> >>> I'd be all for it.  But there isn't AFAICT.
>> >>
>> >> Perhaps there is.  We fixed the problem for Linux by adding kvmclock and
>> >> backporting it to distros that users are most likely to use.  Windows
>> >> fixed the problem by adding their own pv clock interface.  So we need to
>> >> implement that, then focus on tick catchup for Windows XP and other
>> >> guests with no pv interface (*BSD, etc.)
>> > 
>> > Tick catchup simply isn't going to work.  That's the whole point of the thread.
>> 
>> I'll restate.  Windows and Linux don't need either qemu-ga or tick
>> catchup since they have pv time interfaces.  FreeBSD and less frequently
>> used guests are unlikely to get a qemu-ga port, so they need tick
>> catchup.  Is there reason to believe tick catchup won't work on FreeBSD?
>> 
> If FreeBSD tries to compensate for lost ticks it may not work.

Right, the problem is with guests that are too clever for their own
good.  No idea where FreeBSD (or the others, just using it as a
placeholder) fall.  But my guess is that the less popular the guest, the
fewer dirty tricks it pulls.

> 
>> >>
>> >> Those older guests are also less likely to have a qemu-ga port or
>> >> administrator motivation to install it.
>> > 
>> > That's a strange assertion to make.  FWIW, the issue with hibernation
>> > was reported to me with a combination of WinXP and Windows 7 guests, in
>> > this case, it's a totally new deployment.  Adding qemu-ga is totally
>> > reasonable.
>> 
>> Windows 7 doesn't need anything if we implement the pv time interface.
> What PV interface exactly? According to [1] Hyper-v also tries to
> "catch-up" timer by shortening timer period unless to many events were
> missed.
> 
> [1] http://msdn.microsoft.com/en-us/library/windows/hardware/ff542561%28v=vs.85%29.aspx
> 

Reference Time Counter.  If Windows uses that in preference to the tick,
then tick catch up is immaterial.


>> That is less effort than requiring a qemu-ga installation.  Windows XP
>> is an edge case.  We can of course support qemu-ga for it, or we can
>> massage the tick code to work with it, since it's timekeeping is likely
>> a lot less sophisticated than 7's.
>> 
> How do you propose to "massage the tick code" to compensate for 100
> hours of missed ticks in a sane way? 

Probably not solvable.  But I'm less concerned about host suspend and
more about overcommit, which is more likely to cause missed ticks in
practice.

> As far as I know there is no
> difference in timekeeping between Windows XP and Windows 7 (at least
> without PV).

Including the rtc resync?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-19 16:44                                   ` Avi Kivity
@ 2012-09-19 16:55                                     ` Gleb Natapov
  2012-09-19 16:57                                       ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2012-09-19 16:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jan Kiszka, Michael Roth, qemu-devel, Anthony Liguori,
	Paolo Bonzini, Luiz Capitulino, Eric Blake

On Wed, Sep 19, 2012 at 07:44:27PM +0300, Avi Kivity wrote:
> On 09/19/2012 07:37 PM, Gleb Natapov wrote:
> > On Wed, Sep 19, 2012 at 06:34:46PM +0300, Avi Kivity wrote:
> >> On 09/16/2012 05:37 PM, Anthony Liguori wrote:
> >> > Avi Kivity <avi@redhat.com> writes:
> >> > 
> >> >> On 09/13/2012 09:27 PM, Anthony Liguori wrote:
> >> >>> If there was a better/equivalent solution that didn't depend on qemu-ga,
> >> >>> I'd be all for it.  But there isn't AFAICT.
> >> >>
> >> >> Perhaps there is.  We fixed the problem for Linux by adding kvmclock and
> >> >> backporting it to distros that users are most likely to use.  Windows
> >> >> fixed the problem by adding their own pv clock interface.  So we need to
> >> >> implement that, then focus on tick catchup for Windows XP and other
> >> >> guests with no pv interface (*BSD, etc.)
> >> > 
> >> > Tick catchup simply isn't going to work.  That's the whole point of the thread.
> >> 
> >> I'll restate.  Windows and Linux don't need either qemu-ga or tick
> >> catchup since they have pv time interfaces.  FreeBSD and less frequently
> >> used guests are unlikely to get a qemu-ga port, so they need tick
> >> catchup.  Is there reason to believe tick catchup won't work on FreeBSD?
> >> 
> > If FreeBSD tries to compensate for lost ticks it may not work.
> 
> Right, the problem is with guests that are too clever for their own
> good.  No idea where FreeBSD (or the others, just using it as a
> placeholder) fall.  But my guess is that the less popular the guest, the
> fewer dirty tricks it pulls.
> 
> > 
> >> >>
> >> >> Those older guests are also less likely to have a qemu-ga port or
> >> >> administrator motivation to install it.
> >> > 
> >> > That's a strange assertion to make.  FWIW, the issue with hibernation
> >> > was reported to me with a combination of WinXP and Windows 7 guests, in
> >> > this case, it's a totally new deployment.  Adding qemu-ga is totally
> >> > reasonable.
> >> 
> >> Windows 7 doesn't need anything if we implement the pv time interface.
> > What PV interface exactly? According to [1] Hyper-v also tries to
> > "catch-up" timer by shortening timer period unless to many events were
> > missed.
> > 
> > [1] http://msdn.microsoft.com/en-us/library/windows/hardware/ff542561%28v=vs.85%29.aspx
> > 
> 
> Reference Time Counter.  If Windows uses that in preference to the tick,
> then tick catch up is immaterial.
> 
Windows uses it for QPC if iTSC (kvmclock) is not available. I am not at
all sure Windows uses Reference Time Counter for time keeping.

> 
> >> That is less effort than requiring a qemu-ga installation.  Windows XP
> >> is an edge case.  We can of course support qemu-ga for it, or we can
> >> massage the tick code to work with it, since it's timekeeping is likely
> >> a lot less sophisticated than 7's.
> >> 
> > How do you propose to "massage the tick code" to compensate for 100
> > hours of missed ticks in a sane way? 
> 
> Probably not solvable.  But I'm less concerned about host suspend and
> more about overcommit, which is more likely to cause missed ticks in
> practice.
> 
> > As far as I know there is no
> > difference in timekeeping between Windows XP and Windows 7 (at least
> > without PV).
> 
> Including the rtc resync?
> 
You mean resync time with RTC from time to time. I think so. In practice
I didn't hear any complains about it for any Windows. We can solve
resync problem easily though by reporting time as "current time" - "time we
going to reinject"

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Rethinking missed tick catchup
  2012-09-19 16:55                                     ` Gleb Natapov
@ 2012-09-19 16:57                                       ` Avi Kivity
  0 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2012-09-19 16:57 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jan Kiszka, Michael Roth, qemu-devel, Anthony Liguori,
	Paolo Bonzini, Luiz Capitulino, Eric Blake

On 09/19/2012 07:55 PM, Gleb Natapov wrote:
> On Wed, Sep 19, 2012 at 07:44:27PM +0300, Avi Kivity wrote:
>> On 09/19/2012 07:37 PM, Gleb Natapov wrote:
>> > On Wed, Sep 19, 2012 at 06:34:46PM +0300, Avi Kivity wrote:
>> >> On 09/16/2012 05:37 PM, Anthony Liguori wrote:
>> >> > Avi Kivity <avi@redhat.com> writes:
>> >> > 
>> >> >> On 09/13/2012 09:27 PM, Anthony Liguori wrote:
>> >> >>> If there was a better/equivalent solution that didn't depend on qemu-ga,
>> >> >>> I'd be all for it.  But there isn't AFAICT.
>> >> >>
>> >> >> Perhaps there is.  We fixed the problem for Linux by adding kvmclock and
>> >> >> backporting it to distros that users are most likely to use.  Windows
>> >> >> fixed the problem by adding their own pv clock interface.  So we need to
>> >> >> implement that, then focus on tick catchup for Windows XP and other
>> >> >> guests with no pv interface (*BSD, etc.)
>> >> > 
>> >> > Tick catchup simply isn't going to work.  That's the whole point of the thread.
>> >> 
>> >> I'll restate.  Windows and Linux don't need either qemu-ga or tick
>> >> catchup since they have pv time interfaces.  FreeBSD and less frequently
>> >> used guests are unlikely to get a qemu-ga port, so they need tick
>> >> catchup.  Is there reason to believe tick catchup won't work on FreeBSD?
>> >> 
>> > If FreeBSD tries to compensate for lost ticks it may not work.
>> 
>> Right, the problem is with guests that are too clever for their own
>> good.  No idea where FreeBSD (or the others, just using it as a
>> placeholder) fall.  But my guess is that the less popular the guest, the
>> fewer dirty tricks it pulls.
>> 
>> > 
>> >> >>
>> >> >> Those older guests are also less likely to have a qemu-ga port or
>> >> >> administrator motivation to install it.
>> >> > 
>> >> > That's a strange assertion to make.  FWIW, the issue with hibernation
>> >> > was reported to me with a combination of WinXP and Windows 7 guests, in
>> >> > this case, it's a totally new deployment.  Adding qemu-ga is totally
>> >> > reasonable.
>> >> 
>> >> Windows 7 doesn't need anything if we implement the pv time interface.
>> > What PV interface exactly? According to [1] Hyper-v also tries to
>> > "catch-up" timer by shortening timer period unless to many events were
>> > missed.
>> > 
>> > [1] http://msdn.microsoft.com/en-us/library/windows/hardware/ff542561%28v=vs.85%29.aspx
>> > 
>> 
>> Reference Time Counter.  If Windows uses that in preference to the tick,
>> then tick catch up is immaterial.
>> 
> Windows uses it for QPC if iTSC (kvmclock) is not available. I am not at
> all sure Windows uses Reference Time Counter for time keeping.

Would be good to know, except...

> 
>> 
>> >> That is less effort than requiring a qemu-ga installation.  Windows XP
>> >> is an edge case.  We can of course support qemu-ga for it, or we can
>> >> massage the tick code to work with it, since it's timekeeping is likely
>> >> a lot less sophisticated than 7's.
>> >> 
>> > How do you propose to "massage the tick code" to compensate for 100
>> > hours of missed ticks in a sane way? 
>> 
>> Probably not solvable.  But I'm less concerned about host suspend and
>> more about overcommit, which is more likely to cause missed ticks in
>> practice.
>> 
>> > As far as I know there is no
>> > difference in timekeeping between Windows XP and Windows 7 (at least
>> > without PV).
>> 
>> Including the rtc resync?
>> 
> You mean resync time with RTC from time to time. I think so. In practice
> I didn't hear any complains about it for any Windows. We can solve
> resync problem easily though by reporting time as "current time" - "time we
> going to reinject"

Clever!  I think this is a point in favour of tick catchup (also, the
fact the hyper-v uses it).  Of course it's hard to implement with some
time sources in the kernel and some in qemu.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2012-09-19 16:57 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-12 13:54 [Qemu-devel] Rethinking missed tick catchup Anthony Liguori
2012-09-12 14:21 ` Jan Kiszka
2012-09-12 14:44   ` Anthony Liguori
2012-09-12 14:50     ` Jan Kiszka
2012-09-12 15:06     ` Gleb Natapov
2012-09-12 15:42       ` Jan Kiszka
2012-09-12 15:45         ` Gleb Natapov
2012-09-12 16:16       ` Gleb Natapov
2012-09-12 15:15 ` Gleb Natapov
2012-09-12 18:19   ` Anthony Liguori
2012-09-13 10:49     ` Gleb Natapov
2012-09-13 13:14       ` Eric Blake
2012-09-13 13:28         ` Daniel P. Berrange
2012-09-13 14:06           ` Anthony Liguori
2012-09-13 14:22             ` Gleb Natapov
2012-09-13 14:34               ` Avi Kivity
2012-09-13 14:42                 ` Eric Blake
2012-09-13 15:40                   ` Avi Kivity
2012-09-13 15:50                     ` Anthony Liguori
2012-09-13 15:53                       ` Avi Kivity
2012-09-13 18:27                         ` Anthony Liguori
2012-09-16 10:05                           ` Avi Kivity
2012-09-16 14:37                             ` Anthony Liguori
2012-09-19 15:34                               ` Avi Kivity
2012-09-19 16:37                                 ` Gleb Natapov
2012-09-19 16:44                                   ` Avi Kivity
2012-09-19 16:55                                     ` Gleb Natapov
2012-09-19 16:57                                       ` Avi Kivity
2012-09-13 14:35               ` Anthony Liguori
2012-09-13 14:48                 ` Gleb Natapov
2012-09-13 15:51                   ` Avi Kivity
2012-09-13 15:56                   ` Anthony Liguori
2012-09-13 16:06                     ` Gleb Natapov
2012-09-13 18:33                       ` Anthony Liguori
2012-09-13 18:56                         ` Gleb Natapov
2012-09-13 20:06                           ` Anthony Liguori
2012-09-13 16:08                     ` Avi Kivity
2012-09-13 13:47         ` Gleb Natapov
2012-09-12 16:27 ` Stefan Weil
2012-09-12 16:45   ` Gleb Natapov
2012-09-12 17:30     ` Stefan Weil
2012-09-12 18:13       ` Gleb Natapov
2012-09-12 19:45         ` Stefan Weil
2012-09-13 10:50           ` Gleb Natapov
2012-09-12 20:06       ` Michael Roth
2012-09-12 17:23 ` Luiz Capitulino
  -- strict thread matches above, loose matches on Subject: below --
2012-09-12 18:03 Clemens Kolbitsch
2012-09-13  6:25 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).