* windows acpi time drift
@ 2008-03-18 23:09 Dor Laor
2008-03-18 23:33 ` Dor Laor
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Dor Laor @ 2008-03-18 23:09 UTC (permalink / raw)
To: kvm-devel, Avi Kivity
After some research of time drift while using window windows acpi hal I
discovered it uses the ... rtc timer as a source clock.
Not the apic, acpi nor the pit. The acpi timer is not used by the time
keeping clock, the apic & pit timer irqs are masked.
In order to fix the time drift we need to fix the rtc emulation.
The problem is that like the pit and the apic timers in userspace, the
rtc also has inaccurate timer, thus leading to irq coalescing before
getting acknowledged by the guest interrupt controller.
We have two options:
1. Bring another device to the kernel
- It's a simple device
- It will make the rtc clock more accurate (hrtimer)
- Easy time drift fix like apic/pic
- It has very minor performance improvment of canceling the
need to go to userspace after vmexit, thus not syncing vmcs.
But it's only 15msec * 2 rate.
but
- both the pit & rtc are somehow code duplications from
userspace. Both need more accurate timer + interface to
detect irq acks by the pic/apic.
2. The other option is to have an accurate userspace timer (userspace
hrtimer exist >= 2.6.24) and to add interface to pic/apic to queue
pending irqs by the pit/rtc.
The pending queue can be a simple atomic counter per irq.
Note that we also need support for older host kernels.
Before implementing yet another in-kernel device I like to hear
opinions.
Regards,
Dor.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: windows acpi time drift
2008-03-18 23:09 windows acpi time drift Dor Laor
@ 2008-03-18 23:33 ` Dor Laor
2008-03-18 23:57 ` Anthony Liguori
2008-03-18 23:35 ` Anthony Liguori
2008-03-19 0:07 ` Anthony Liguori
2 siblings, 1 reply; 10+ messages in thread
From: Dor Laor @ 2008-03-18 23:33 UTC (permalink / raw)
To: kvm-devel; +Cc: Avi Kivity
On Wed, 2008-03-19 at 01:09 +0200, Dor Laor wrote:
> After some research of time drift while using window windows acpi hal I
> discovered it uses the ... rtc timer as a source clock.
> Not the apic, acpi nor the pit. The acpi timer is not used by the time
> keeping clock, the apic & pit timer irqs are masked.
>
> In order to fix the time drift we need to fix the rtc emulation.
> The problem is that like the pit and the apic timers in userspace, the
> rtc also has inaccurate timer, thus leading to irq coalescing before
> getting acknowledged by the guest interrupt controller.
>
> We have two options:
> 1. Bring another device to the kernel
> - It's a simple device
> - It will make the rtc clock more accurate (hrtimer)
> - Easy time drift fix like apic/pic
> - It has very minor performance improvment of canceling the
> need to go to userspace after vmexit, thus not syncing vmcs.
> But it's only 15msec * 2 rate.
Hmm, when doing multimedia, windows increases the frequency of rtc from
64HZ to 1024HZ, thus in-kernel device will save 2048 user-space exits.
This might be a real improvement - small but measurable.
> but
> - both the pit & rtc are somehow code duplications from
> userspace. Both need more accurate timer + interface to
> detect irq acks by the pic/apic.
> 2. The other option is to have an accurate userspace timer (userspace
> hrtimer exist >= 2.6.24) and to add interface to pic/apic to queue
> pending irqs by the pit/rtc.
> The pending queue can be a simple atomic counter per irq.
> Note that we also need support for older host kernels.
>
> Before implementing yet another in-kernel device I like to hear
> opinions.
> Regards,
> Dor.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: windows acpi time drift
2008-03-18 23:33 ` Dor Laor
@ 2008-03-18 23:57 ` Anthony Liguori
0 siblings, 0 replies; 10+ messages in thread
From: Anthony Liguori @ 2008-03-18 23:57 UTC (permalink / raw)
To: dor.laor; +Cc: kvm-devel, Avi Kivity
Dor Laor wrote:
> On Wed, 2008-03-19 at 01:09 +0200, Dor Laor wrote:
>
>> After some research of time drift while using window windows acpi hal I
>> discovered it uses the ... rtc timer as a source clock.
>> Not the apic, acpi nor the pit. The acpi timer is not used by the time
>> keeping clock, the apic & pit timer irqs are masked.
>>
>> In order to fix the time drift we need to fix the rtc emulation.
>> The problem is that like the pit and the apic timers in userspace, the
>> rtc also has inaccurate timer, thus leading to irq coalescing before
>> getting acknowledged by the guest interrupt controller.
>>
>> We have two options:
>> 1. Bring another device to the kernel
>> - It's a simple device
>> - It will make the rtc clock more accurate (hrtimer)
>> - Easy time drift fix like apic/pic
>> - It has very minor performance improvment of canceling the
>> need to go to userspace after vmexit, thus not syncing vmcs.
>> But it's only 15msec * 2 rate.
>>
>
> Hmm, when doing multimedia, windows increases the frequency of rtc from
> 64HZ to 1024HZ, thus in-kernel device will save 2048 user-space exits.
> This might be a real improvement - small but measurable.
>
Well, let's calculate it. I measure a lightweight exit at 3590 cycles
and a heavyweight exit at 8548. If we look at the cost of dropping to
userspace an extra 2048 times, since I have a 2.2 GHz chip, we're
looking at an additional .0046 seconds to transition to userspace. This
is on top of the base .0033 seconds that it takes to take these exits in
the first place. This are dummy exits though so when you add in the
cost of processing this (which I think is roughly equal whether in
kernelspace or userspace) I think those values will quickly be overwhelmed.
So, at the end of the day, you're not getting even a 1% performance
boost and you're adding a lot of complexity to the kernel. This
strongly suggests to me that we should be doing this in userspace.
Regards,
Anthony Liguori
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: windows acpi time drift
2008-03-18 23:09 windows acpi time drift Dor Laor
2008-03-18 23:33 ` Dor Laor
@ 2008-03-18 23:35 ` Anthony Liguori
2008-03-19 8:19 ` Avi Kivity
2008-03-19 0:07 ` Anthony Liguori
2 siblings, 1 reply; 10+ messages in thread
From: Anthony Liguori @ 2008-03-18 23:35 UTC (permalink / raw)
To: dor.laor; +Cc: kvm-devel, Avi Kivity
Dor Laor wrote:
> After some research of time drift while using window windows acpi hal I
> discovered it uses the ... rtc timer as a source clock.
> Not the apic, acpi nor the pit. The acpi timer is not used by the time
> keeping clock, the apic & pit timer irqs are masked.
>
> In order to fix the time drift we need to fix the rtc emulation.
> The problem is that like the pit and the apic timers in userspace, the
> rtc also has inaccurate timer, thus leading to irq coalescing before
> getting acknowledged by the guest interrupt controller.
>
> We have two options:
> 1. Bring another device to the kernel
> - It's a simple device
> - It will make the rtc clock more accurate (hrtimer)
> - Easy time drift fix like apic/pic
> - It has very minor performance improvment of canceling the
> need to go to userspace after vmexit, thus not syncing vmcs.
> But it's only 15msec * 2 rate.
> but
> - both the pit & rtc are somehow code duplications from
> userspace. Both need more accurate timer + interface to
> detect irq acks by the pic/apic.
> 2. The other option is to have an accurate userspace timer (userspace
> hrtimer exist >= 2.6.24) and to add interface to pic/apic to queue
> pending irqs by the pit/rtc.
> The pending queue can be a simple atomic counter per irq.
> Note that we also need support for older host kernels.
>
Why don't we just introduce a vm-ioctl interface for a one-shot
programmable timer? It could be programmed in userspace, and when it
fires, we can drop down to userspace with a special exit code. We could
then introduce an interrupt queuing mechanism in the kernel specifically
for timer interrupts as you mention.
That lets us remove the in-kernel PIT, and makes all of our timer
mechanisms more accurate. If userspace has a better time mechanism,
like hrtimer, then it can just use that. If hrtimer is good enough in
userspace, then we can contain these new ioctls to the compat-code only.
Regards,
Anthony Liguori
> Before implementing yet another in-kernel device I like to hear
> opinions.
> Regards,
> Dor.
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: windows acpi time drift
2008-03-18 23:35 ` Anthony Liguori
@ 2008-03-19 8:19 ` Avi Kivity
2008-03-19 14:09 ` Anthony Liguori
0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-03-19 8:19 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel
Anthony Liguori wrote:
> Why don't we just introduce a vm-ioctl interface for a one-shot
> programmable timer? It could be programmed in userspace, and when it
> fires, we can drop down to userspace with a special exit code. We could
> then introduce an interrupt queuing mechanism in the kernel specifically
> for timer interrupts as you mention.
>
> That lets us remove the in-kernel PIT, and makes all of our timer
> mechanisms more accurate. If userspace has a better time mechanism,
> like hrtimer, then it can just use that. If hrtimer is good enough in
> userspace, then we can contain these new ioctls to the compat-code only.
>
The problems with timers are:
- on a loaded machine, several timer ticks may be coalesced together on
the host side; we need a way to detect overruns
- with one-shot processing, there is inevitable drift. so we need to
use periodic timers or to compensate for the drift
- when we have accumulated missed interrupts, we need to inject them
- we may need to coordinate tsc and timer values (like Xen)
the first two problems seem to be resolvable via posix timers
(timer_create() & friends). The third issue can be resolved by adding
an ioctl to queue a bunch of injections (raising and lowering a specific
line after the ack). The fourth is probably impossible from userspace
(and very difficult in the kernel).
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: windows acpi time drift
2008-03-19 8:19 ` Avi Kivity
@ 2008-03-19 14:09 ` Anthony Liguori
2008-03-19 15:39 ` Avi Kivity
0 siblings, 1 reply; 10+ messages in thread
From: Anthony Liguori @ 2008-03-19 14:09 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel
Avi Kivity wrote:
> The fourth is probably impossible from userspace (and very difficult
> in the kernel).
What makes it impossible to do in userspace? If you managed a
tsc_offset in userspace, you would of course need to adjust that
tsc_offset within the kernel for the particular PCPU that you were on.
Regards,
Anthony LIguori
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: windows acpi time drift
2008-03-19 14:09 ` Anthony Liguori
@ 2008-03-19 15:39 ` Avi Kivity
2008-03-19 16:27 ` Dor Laor
0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-03-19 15:39 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel
Anthony Liguori wrote:
> Avi Kivity wrote:
>> The fourth is probably impossible from userspace (and very
>> difficult in the kernel).
>
> What makes it impossible to do in userspace? If you managed a
> tsc_offset in userspace, you would of course need to adjust that
> tsc_offset within the kernel for the particular PCPU that you were on.
>
In the kernel you can to tricks like local_irq_disable(); rdtsc();
ktime_get(); local_irq_enable() to get a sense where the tsc is.
Take a look at kvm_inject_pit_timer_irqs() and
kvm_pit_timer_intr_post(). An attempt to have a accurate userspace pit
needs to take into account what those functions do. I believe it's
doable, but will require careful design of the interface (which should
be usable for rtc and hpet as well).
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: windows acpi time drift
2008-03-19 15:39 ` Avi Kivity
@ 2008-03-19 16:27 ` Dor Laor
2008-03-19 17:45 ` Avi Kivity
0 siblings, 1 reply; 10+ messages in thread
From: Dor Laor @ 2008-03-19 16:27 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel
On Wed, 2008-03-19 at 17:39 +0200, Avi Kivity wrote:
> Anthony Liguori wrote:
> > Avi Kivity wrote:
> >> The fourth is probably impossible from userspace (and very
> >> difficult in the kernel).
> >
> > What makes it impossible to do in userspace? If you managed a
> > tsc_offset in userspace, you would of course need to adjust that
> > tsc_offset within the kernel for the particular PCPU that you were on.
> >
>
> In the kernel you can to tricks like local_irq_disable(); rdtsc();
> ktime_get(); local_irq_enable() to get a sense where the tsc is.
but you can also do it before the vcpu goes to userspace after vmexit.
>
> Take a look at kvm_inject_pit_timer_irqs() and
> kvm_pit_timer_intr_post(). An attempt to have a accurate userspace pit
> needs to take into account what those functions do. I believe it's
> doable, but will require careful design of the interface (which should
> be usable for rtc and hpet as well).
>
Actually I'm coming to think we don't need a irq queue in the kernel.
We just need to count the pending timer interrupts in userspace and
change the qemu_set_irq interface to return a status when the irq was
really injected by pic/apic (like kvm_pit_timer_intr_post).
This way qemu timer devices will not inject another irq until the
previous irq got ack by the kernel (or even userspace pic/acpi).
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: windows acpi time drift
2008-03-19 16:27 ` Dor Laor
@ 2008-03-19 17:45 ` Avi Kivity
0 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2008-03-19 17:45 UTC (permalink / raw)
To: dor.laor; +Cc: kvm-devel
Dor Laor wrote:
> On Wed, 2008-03-19 at 17:39 +0200, Avi Kivity wrote:
>
>> Anthony Liguori wrote:
>>
>>> Avi Kivity wrote:
>>>
>>>> The fourth is probably impossible from userspace (and very
>>>> difficult in the kernel).
>>>>
>>> What makes it impossible to do in userspace? If you managed a
>>> tsc_offset in userspace, you would of course need to adjust that
>>> tsc_offset within the kernel for the particular PCPU that you were on.
>>>
>>>
>> In the kernel you can to tricks like local_irq_disable(); rdtsc();
>> ktime_get(); local_irq_enable() to get a sense where the tsc is.
>>
>
> but you can also do it before the vcpu goes to userspace after vmexit.
>
>
You only want to do it when needed. We might add an ioctl for it, but
it's tricky.
>> Take a look at kvm_inject_pit_timer_irqs() and
>> kvm_pit_timer_intr_post(). An attempt to have a accurate userspace pit
>> needs to take into account what those functions do. I believe it's
>> doable, but will require careful design of the interface (which should
>> be usable for rtc and hpet as well).
>>
>>
>
> Actually I'm coming to think we don't need a irq queue in the kernel.
> We just need to count the pending timer interrupts in userspace and
> change the qemu_set_irq interface to return a status when the irq was
> really injected by pic/apic (like kvm_pit_timer_intr_post).
>
> This way qemu timer devices will not inject another irq until the
> previous irq got ack by the kernel (or even userspace pic/acpi).
>
Yes, I think you're right. We can return the information in the vcpu
shared area, so it doesn't generate new exits.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: windows acpi time drift
2008-03-18 23:09 windows acpi time drift Dor Laor
2008-03-18 23:33 ` Dor Laor
2008-03-18 23:35 ` Anthony Liguori
@ 2008-03-19 0:07 ` Anthony Liguori
2 siblings, 0 replies; 10+ messages in thread
From: Anthony Liguori @ 2008-03-19 0:07 UTC (permalink / raw)
To: dor.laor; +Cc: kvm-devel, Avi Kivity
Dor Laor wrote:
> 2. The other option is to have an accurate userspace timer (userspace
> hrtimer exist >= 2.6.24) and to add interface to pic/apic to queue
> pending irqs by the pit/rtc.
> The pending queue can be a simple atomic counter per irq.
>
So this may explain why I see no appreciable benefit from using the
in-kernel PIT verses using the userspace PIT and -tdf. I'm using a
2.6.24 host kernel and QEMU by default will use the "unix" time source
which will use setitimer. IIUC, itimer will use hrtimers.
So if there isn't an appreciate CPU utilization improvement and there
isn't an increased accuracy, there won't be an improvement.
Do ya'll see an improvement playing multimedia on a 2.6.24 host with the
in-kernel PIT?
Regards,
Anthony Liguori
> Note that we also need support for older host kernels.
>
> Before implementing yet another in-kernel device I like to hear
> opinions.
> Regards,
> Dor.
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-03-19 17:45 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-18 23:09 windows acpi time drift Dor Laor
2008-03-18 23:33 ` Dor Laor
2008-03-18 23:57 ` Anthony Liguori
2008-03-18 23:35 ` Anthony Liguori
2008-03-19 8:19 ` Avi Kivity
2008-03-19 14:09 ` Anthony Liguori
2008-03-19 15:39 ` Avi Kivity
2008-03-19 16:27 ` Dor Laor
2008-03-19 17:45 ` Avi Kivity
2008-03-19 0:07 ` Anthony Liguori
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox