[RFC 0/4] KVM in-kernel PM Timer implementation

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/4] KVM in-kernel PM Timer implementation
       [not found] <344060531.680691292328457867.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
@ 2010-12-14 12:09 ` Ulrich Obergfell
  2010-12-14 13:34   ` Avi Kivity
  2010-12-14 15:29   ` Anthony Liguori
  0 siblings, 2 replies; 19+ messages in thread
From: Ulrich Obergfell @ 2010-12-14 12:09 UTC (permalink / raw)
  To: kvm; +Cc: glommer, zamsden, avi, mtosatti

Hi,

This is an RFC through which I would like to get feedback on how the
idea of in-kernel PM Timer would be received.

The current implementation of PM Timer emulation is 'heavy-weight'
because the code resides in qemu userspace. Guest operating systems
that use PM Timer as a clock source (for example, older versions of
Linux that do not have paravirtualized clock) would benefit from an
in-kernel PM Timer emulation.

Parts 1 thru 4 of this RFC contain experimental source code which
I recently used to investigate the performance benefit. In a Linux
guest, I was running a program that calls gettimeofday() 'n' times
in a loop (the PM Timer register is read during each call). With
in-kernel PM Timer, I observed a significant reduction of program
execution time.

The experimental code emulates the PM Timer register in KVM kernel.
All other components of ACPI PM remain in qemu userspace. Also, the
'timer carry interrupt' feature is not implemented in-kernel. If a
guest operating system needs to enable the 'timer carry interrupt',
the code takes care that PM Timer emulation falls back to userspace.
However, I think the design of the code has sufficient flexibility,
so that anyone who would want to add the 'timer carry interrupt'
feature in-kernel could try to do so later on.

Please review and please comment.

Regards,

Uli Obergfell

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 12:09 ` Ulrich Obergfell
@ 2010-12-14 13:34   ` Avi Kivity
  2010-12-14 13:40     ` Glauber Costa
  2010-12-14 15:29   ` Anthony Liguori
  1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2010-12-14 13:34 UTC (permalink / raw)
  To: Ulrich Obergfell; +Cc: kvm, glommer, zamsden, mtosatti

On 12/14/2010 02:09 PM, Ulrich Obergfell wrote:
> Hi,
>
> This is an RFC through which I would like to get feedback on how the
> idea of in-kernel PM Timer would be received.
>
> The current implementation of PM Timer emulation is 'heavy-weight'
> because the code resides in qemu userspace. Guest operating systems
> that use PM Timer as a clock source (for example, older versions of
> Linux that do not have paravirtualized clock) would benefit from an
> in-kernel PM Timer emulation.
>
> Parts 1 thru 4 of this RFC contain experimental source code which
> I recently used to investigate the performance benefit. In a Linux
> guest, I was running a program that calls gettimeofday() 'n' times
> in a loop (the PM Timer register is read during each call). With
> in-kernel PM Timer, I observed a significant reduction of program
> execution time.
>
> The experimental code emulates the PM Timer register in KVM kernel.
> All other components of ACPI PM remain in qemu userspace. Also, the
> 'timer carry interrupt' feature is not implemented in-kernel. If a
> guest operating system needs to enable the 'timer carry interrupt',
> the code takes care that PM Timer emulation falls back to userspace.
> However, I think the design of the code has sufficient flexibility,
> so that anyone who would want to add the 'timer carry interrupt'
> feature in-kernel could try to do so later on.
>

What is the motivation for this?  Are there any important guests that 
use the pmtimer?

If anything I'd expect hpet or the Microsoft synthetic timers to be a 
lot more important.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 13:34   ` Avi Kivity
@ 2010-12-14 13:40     ` Glauber Costa
  2010-12-14 13:49       ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Glauber Costa @ 2010-12-14 13:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Ulrich Obergfell, kvm, zamsden, mtosatti

On Tue, 2010-12-14 at 15:34 +0200, Avi Kivity wrote:
> On 12/14/2010 02:09 PM, Ulrich Obergfell wrote:
> > Hi,
> >
> > This is an RFC through which I would like to get feedback on how the
> > idea of in-kernel PM Timer would be received.
> >
> > The current implementation of PM Timer emulation is 'heavy-weight'
> > because the code resides in qemu userspace. Guest operating systems
> > that use PM Timer as a clock source (for example, older versions of
> > Linux that do not have paravirtualized clock) would benefit from an
> > in-kernel PM Timer emulation.
> >
> > Parts 1 thru 4 of this RFC contain experimental source code which
> > I recently used to investigate the performance benefit. In a Linux
> > guest, I was running a program that calls gettimeofday() 'n' times
> > in a loop (the PM Timer register is read during each call). With
> > in-kernel PM Timer, I observed a significant reduction of program
> > execution time.
> >
> > The experimental code emulates the PM Timer register in KVM kernel.
> > All other components of ACPI PM remain in qemu userspace. Also, the
> > 'timer carry interrupt' feature is not implemented in-kernel. If a
> > guest operating system needs to enable the 'timer carry interrupt',
> > the code takes care that PM Timer emulation falls back to userspace.
> > However, I think the design of the code has sufficient flexibility,
> > so that anyone who would want to add the 'timer carry interrupt'
> > feature in-kernel could try to do so later on.
> >
> 
> What is the motivation for this?  Are there any important guests that 
> use the pmtimer?
Avi,

All older RHEL and Windows, for example, would benefit for this.

> If anything I'd expect hpet or the Microsoft synthetic timers to be a 
> lot more important.

True. But also a lot more work.
Implementing just the pm timer counter - not the whole of it - in
kernel, gives us a lot of gain with not very much effort. Patch is
pretty simple, as you can see, and most of it is even code to turn it
on/off, etc.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 13:40     ` Glauber Costa
@ 2010-12-14 13:49       ` Avi Kivity
  2010-12-14 13:52         ` Gleb Natapov
  2010-12-14 15:32         ` Anthony Liguori
  0 siblings, 2 replies; 19+ messages in thread
From: Avi Kivity @ 2010-12-14 13:49 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Ulrich Obergfell, kvm, zamsden, mtosatti

On 12/14/2010 03:40 PM, Glauber Costa wrote:
> >
> >  What is the motivation for this?  Are there any important guests that
> >  use the pmtimer?
> Avi,
>
> All older RHEL and Windows, for example, would benefit for this.

They only benefit from it because we don't provide HPET.  If we did, the 
guests would use HPET in preference to pmtimer, since HPET is so much 
better than pmtimer (yet still sucks in an absolute sense).

> >  If anything I'd expect hpet or the Microsoft synthetic timers to be a
> >  lot more important.
>
> True. But also a lot more work.
> Implementing just the pm timer counter - not the whole of it - in
> kernel, gives us a lot of gain with not very much effort. Patch is
> pretty simple, as you can see, and most of it is even code to turn it
> on/off, etc.
>

Partial emulation is not something I like since it causes a fuzzy 
kernel/user boundary.  In this case, transitioning to userspace when 
interrupts are enabled doesn't look so hot.  Are you sure all guests 
that benefit from this don't enable the pmtimer interrupt?  What about 
the transition?  Will we have a time discontinuity when that happens?

What I'd really like to see is this stuff implemented in bytecode, 
unfortunately that's a lot of work which will be very hard to upstream.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 13:49       ` Avi Kivity
@ 2010-12-14 13:52         ` Gleb Natapov
  2010-12-14 15:32         ` Anthony Liguori
  1 sibling, 0 replies; 19+ messages in thread
From: Gleb Natapov @ 2010-12-14 13:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Glauber Costa, Ulrich Obergfell, kvm, zamsden, mtosatti

On Tue, Dec 14, 2010 at 03:49:37PM +0200, Avi Kivity wrote:
> On 12/14/2010 03:40 PM, Glauber Costa wrote:
> >>
> >>  What is the motivation for this?  Are there any important guests that
> >>  use the pmtimer?
> >Avi,
> >
> >All older RHEL and Windows, for example, would benefit for this.
> 
> They only benefit from it because we don't provide HPET.  If we did,
> the guests would use HPET in preference to pmtimer, since HPET is so
> much better than pmtimer (yet still sucks in an absolute sense).
> 
> >>  If anything I'd expect hpet or the Microsoft synthetic timers to be a
> >>  lot more important.
> >
> >True. But also a lot more work.
> >Implementing just the pm timer counter - not the whole of it - in
> >kernel, gives us a lot of gain with not very much effort. Patch is
> >pretty simple, as you can see, and most of it is even code to turn it
> >on/off, etc.
> >
> 
> Partial emulation is not something I like since it causes a fuzzy
> kernel/user boundary.  In this case, transitioning to userspace when
> interrupts are enabled doesn't look so hot.  Are you sure all guests
> that benefit from this don't enable the pmtimer interrupt?  What
> about the transition?  Will we have a time discontinuity when that
> happens?
> 
> What I'd really like to see is this stuff implemented in bytecode,
> unfortunately that's a lot of work which will be very hard to
> upstream.
>
<joke> 
Just use ACPI bytecode. It is upstream already.
</joke>

--
			Gleb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
       [not found] <953393305.700721292337871455.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
@ 2010-12-14 14:44 ` Ulrich Obergfell
  2010-12-14 15:12   ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Ulrich Obergfell @ 2010-12-14 14:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, zamsden, mtosatti, Glauber Costa


----- "Avi Kivity" <avi@redhat.com> wrote:

> On 12/14/2010 03:40 PM, Glauber Costa wrote:
> > >
> > >  What is the motivation for this?  Are there any important guests that
> > >  use the pmtimer?
> > Avi,
> >
> > All older RHEL and Windows, for example, would benefit for this.
> 
> They only benefit from it because we don't provide HPET.  If we did, the 
> guests would use HPET in preference to pmtimer, since HPET is so much
> better than pmtimer (yet still sucks in an absolute sense).
> 
> > >  If anything I'd expect hpet or the Microsoft synthetic timers to be a
> > >  lot more important.
> >
> > True. But also a lot more work.
> > Implementing just the pm timer counter - not the whole of it - in
> > kernel, gives us a lot of gain with not very much effort. Patch is
> > pretty simple, as you can see, and most of it is even code to turn it
> > on/off, etc.
> >
> 
> Partial emulation is not something I like since it causes a fuzzy 
> kernel/user boundary.  In this case, transitioning to userspace when 
> interrupts are enabled doesn't look so hot.  Are you sure all guests 
> that benefit from this don't enable the pmtimer interrupt?  What about
> the transition?  Will we have a time discontinuity when that happens?

Avi,

the idea is to use the '-kvm-pmtmr' option (in code part 4) only
with guests that do not enable the 'timer carry interrupt'. Guests
that need to enable the 'timer carry interrupt' should rather use
the PM Timer emulation in qemu userspace (i.e. they should not be
started with this option). If a guest is accidentally started with
this option, the in-kernel PM Timer (in code part 1) detects if
the guest attempts to enable the 'timer carry interrupt' and falls
back to PM Timer emulation in qemu userspace (in-kernel PM Timer
disables itself automatically). So, this is not a combination of
in-kernel PM Timer register emulation and qemu userspace PM Timer
interrupt emulation.

Regards,

Uli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 14:44 ` [RFC 0/4] KVM in-kernel PM Timer implementation Ulrich Obergfell
@ 2010-12-14 15:12   ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2010-12-14 15:12 UTC (permalink / raw)
  To: Ulrich Obergfell; +Cc: kvm, zamsden, mtosatti, Glauber Costa

On 12/14/2010 04:44 PM, Ulrich Obergfell wrote:
> >
> >  Partial emulation is not something I like since it causes a fuzzy
> >  kernel/user boundary.  In this case, transitioning to userspace when
> >  interrupts are enabled doesn't look so hot.  Are you sure all guests
> >  that benefit from this don't enable the pmtimer interrupt?  What about
> >  the transition?  Will we have a time discontinuity when that happens?
>
> Avi,
>
> the idea is to use the '-kvm-pmtmr' option (in code part 4) only
> with guests that do not enable the 'timer carry interrupt'. Guests
> that need to enable the 'timer carry interrupt' should rather use
> the PM Timer emulation in qemu userspace (i.e. they should not be
> started with this option). If a guest is accidentally started with
> this option, the in-kernel PM Timer (in code part 1) detects if
> the guest attempts to enable the 'timer carry interrupt' and falls
> back to PM Timer emulation in qemu userspace (in-kernel PM Timer
> disables itself automatically). So, this is not a combination of
> in-kernel PM Timer register emulation and qemu userspace PM Timer
> interrupt emulation.
>

We really try to avoid guest specific parameters.  Having to decide if 
the guest has virtio is bad enough, but going into low level details 
like that is really bad.  The host admin might not even know what 
operating systems its guests run.

A guest might even dual boot two different operating systems.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 12:09 ` Ulrich Obergfell
  2010-12-14 13:34   ` Avi Kivity
@ 2010-12-14 15:29   ` Anthony Liguori
  2010-12-14 18:00     ` David S. Ahern
  1 sibling, 1 reply; 19+ messages in thread
From: Anthony Liguori @ 2010-12-14 15:29 UTC (permalink / raw)
  To: Ulrich Obergfell; +Cc: kvm, glommer, zamsden, avi, mtosatti

On 12/14/2010 06:09 AM, Ulrich Obergfell wrote:
> Hi,
>
> This is an RFC through which I would like to get feedback on how the
> idea of in-kernel PM Timer would be received.
>
> The current implementation of PM Timer emulation is 'heavy-weight'
> because the code resides in qemu userspace. Guest operating systems
> that use PM Timer as a clock source (for example, older versions of
> Linux that do not have paravirtualized clock) would benefit from an
> in-kernel PM Timer emulation.
>
> Parts 1 thru 4 of this RFC contain experimental source code which
> I recently used to investigate the performance benefit. In a Linux
> guest, I was running a program that calls gettimeofday() 'n' times
> in a loop (the PM Timer register is read during each call). With
> in-kernel PM Timer, I observed a significant reduction of program
> execution time.
>    

I've played with this in the past.  Can you post real numbers, 
preferably, with a real work load?

Regards,

Anthony Liguori

> The experimental code emulates the PM Timer register in KVM kernel.
> All other components of ACPI PM remain in qemu userspace. Also, the
> 'timer carry interrupt' feature is not implemented in-kernel. If a
> guest operating system needs to enable the 'timer carry interrupt',
> the code takes care that PM Timer emulation falls back to userspace.
> However, I think the design of the code has sufficient flexibility,
> so that anyone who would want to add the 'timer carry interrupt'
> feature in-kernel could try to do so later on.
>
> Please review and please comment.
>
>
> Regards,
>
> Uli Obergfell
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 13:49       ` Avi Kivity
  2010-12-14 13:52         ` Gleb Natapov
@ 2010-12-14 15:32         ` Anthony Liguori
  2010-12-14 15:38           ` Avi Kivity
  1 sibling, 1 reply; 19+ messages in thread
From: Anthony Liguori @ 2010-12-14 15:32 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Glauber Costa, Ulrich Obergfell, kvm, zamsden, mtosatti

On 12/14/2010 07:49 AM, Avi Kivity wrote:
> On 12/14/2010 03:40 PM, Glauber Costa wrote:
>> >
>> >  What is the motivation for this?  Are there any important guests that
>> >  use the pmtimer?
>> Avi,
>>
>> All older RHEL and Windows, for example, would benefit for this.
>
> They only benefit from it because we don't provide HPET.  If we did, 
> the guests would use HPET in preference to pmtimer, since HPET is so 
> much better than pmtimer (yet still sucks in an absolute sense).
>
>> >  If anything I'd expect hpet or the Microsoft synthetic timers to be a
>> >  lot more important.
>>
>> True. But also a lot more work.
>> Implementing just the pm timer counter - not the whole of it - in
>> kernel, gives us a lot of gain with not very much effort. Patch is
>> pretty simple, as you can see, and most of it is even code to turn it
>> on/off, etc.
>>
>
> Partial emulation is not something I like since it causes a fuzzy 
> kernel/user boundary.  In this case, transitioning to userspace when 
> interrupts are enabled doesn't look so hot.  Are you sure all guests 
> that benefit from this don't enable the pmtimer interrupt?  What about 
> the transition?  Will we have a time discontinuity when that happens?
>
> What I'd really like to see is this stuff implemented in bytecode, 
> unfortunately that's a lot of work which will be very hard to upstream.

Fortunately, we have a very good bytecode interpreter that's accelerated 
in the kernel called KVM ;-)

Why not have the equivalent of a paravirtual SMM mode where we can 
reflect IO exits back to the guest in a well defined way?  It could then 
implement PM timer in terms of HPET or something like that.

We already have a virtual address space that works for most guests 
thanks to the TPR optimization.

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 15:32         ` Anthony Liguori
@ 2010-12-14 15:38           ` Avi Kivity
  2010-12-14 16:04             ` Anthony Liguori
  0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2010-12-14 15:38 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Glauber Costa, Ulrich Obergfell, kvm, zamsden, mtosatti

On 12/14/2010 05:32 PM, Anthony Liguori wrote:
>>
>>> >  If anything I'd expect hpet or the Microsoft synthetic timers to 
>>> be a
>>> >  lot more important.
>>>
>>> True. But also a lot more work.
>>> Implementing just the pm timer counter - not the whole of it - in
>>> kernel, gives us a lot of gain with not very much effort. Patch is
>>> pretty simple, as you can see, and most of it is even code to turn it
>>> on/off, etc.
>>>
>>
>> Partial emulation is not something I like since it causes a fuzzy 
>> kernel/user boundary.  In this case, transitioning to userspace when 
>> interrupts are enabled doesn't look so hot.  Are you sure all guests 
>> that benefit from this don't enable the pmtimer interrupt?  What 
>> about the transition?  Will we have a time discontinuity when that 
>> happens?
>>
>> What I'd really like to see is this stuff implemented in bytecode, 
>> unfortunately that's a lot of work which will be very hard to upstream.
>
>
> Fortunately, we have a very good bytecode interpreter that's 
> accelerated in the kernel called KVM ;-)

We have exactly the same bytecode interpreter under a different name, 
it's called userspace.

If you can afford to make the transition back to the guest for 
emulation, you might as well transition to userspace.

>
> Why not have the equivalent of a paravirtual SMM mode where we can 
> reflect IO exits back to the guest in a well defined way?  It could 
> then implement PM timer in terms of HPET or something like that.

More exits.

>
> We already have a virtual address space that works for most guests 
> thanks to the TPR optimization.

It only works for Windows XP and Windows XP with the /3GB extension.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 15:38           ` Avi Kivity
@ 2010-12-14 16:04             ` Anthony Liguori
  2010-12-15  9:33               ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Anthony Liguori @ 2010-12-14 16:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Glauber Costa, Ulrich Obergfell, kvm, zamsden, mtosatti

On 12/14/2010 09:38 AM, Avi Kivity wrote:
> Fortunately, we have a very good bytecode interpreter that's 
> accelerated in the kernel called KVM ;-)
>
> We have exactly the same bytecode interpreter under a different name, 
> it's called userspace.
>
> If you can afford to make the transition back to the guest for 
> emulation, you might as well transition to userspace.

If you re-entered the guest and setup a stack that had the RIP of the 
source of the exit, then there's no additional need to exit the guest.  
The handler can just do an iret.  Or am I missing something?

>>
>> Why not have the equivalent of a paravirtual SMM mode where we can 
>> reflect IO exits back to the guest in a well defined way?  It could 
>> then implement PM timer in terms of HPET or something like that.
>
> More exits.

Yeah, I should have said, implement in terms of kvmclock so no 
additional exits.

>>
>> We already have a virtual address space that works for most guests 
>> thanks to the TPR optimization.
>
> It only works for Windows XP and Windows XP with the /3GB extension.

Is this a fundamental limitation or just a statement of today's 
heuristics?  Does any guest not keep the BIOS in virtual memory in a 
static location?

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 15:29   ` Anthony Liguori
@ 2010-12-14 18:00     ` David S. Ahern
  2010-12-14 19:49       ` Anthony Liguori
  0 siblings, 1 reply; 19+ messages in thread
From: David S. Ahern @ 2010-12-14 18:00 UTC (permalink / raw)
  To: Ulrich Obergfell; +Cc: Anthony Liguori, kvm, glommer, zamsden, avi, mtosatti



On 12/14/10 08:29, Anthony Liguori wrote:

>> I recently used to investigate the performance benefit. In a Linux
>> guest, I was running a program that calls gettimeofday() 'n' times
>> in a loop (the PM Timer register is read during each call). With
>> in-kernel PM Timer, I observed a significant reduction of program
>> execution time.
>>    
> 
> I've played with this in the past.  Can you post real numbers,
> preferably, with a real work load?

2 years ago I posted relative comparisons of the time sources for older
RHEL guests:
http://www.mail-archive.com/kvm@vger.kernel.org/msg07231.html

What's the relative speed of the in-kernel pmtimer compared to the PIT?

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 18:00     ` David S. Ahern
@ 2010-12-14 19:49       ` Anthony Liguori
  2010-12-14 19:54         ` David S. Ahern
  0 siblings, 1 reply; 19+ messages in thread
From: Anthony Liguori @ 2010-12-14 19:49 UTC (permalink / raw)
  To: David S. Ahern; +Cc: Ulrich Obergfell, kvm, glommer, zamsden, avi, mtosatti

On 12/14/2010 12:00 PM, David S. Ahern wrote:
>
> On 12/14/10 08:29, Anthony Liguori wrote:
>
>    
>>> I recently used to investigate the performance benefit. In a Linux
>>> guest, I was running a program that calls gettimeofday() 'n' times
>>> in a loop (the PM Timer register is read during each call). With
>>> in-kernel PM Timer, I observed a significant reduction of program
>>> execution time.
>>>
>>>        
>> I've played with this in the past.  Can you post real numbers,
>> preferably, with a real work load?
>>      
> 2 years ago I posted relative comparisons of the time sources for older
> RHEL guests:
> http://www.mail-archive.com/kvm@vger.kernel.org/msg07231.html
>    

Any time you write a program in userspace that effectively equates to a 
single PIO operation that is easy to emulate, it's going to be 
remarkably faster to implement that PIO emulation in the kernel than in 
userspace because vmexit exit cost dominates the execution path.

But that doesn't tell you what the impact is in real world workloads.  
Before we start pushing all device emulation into the kernel, we need to 
quantify how often gettimeofday() is really called in real workloads.

Regards,

Anthony Liguori

> What's the relative speed of the in-kernel pmtimer compared to the PIT?
>
> David
>    


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 19:49       ` Anthony Liguori
@ 2010-12-14 19:54         ` David S. Ahern
  2010-12-14 21:46           ` Anthony Liguori
  0 siblings, 1 reply; 19+ messages in thread
From: David S. Ahern @ 2010-12-14 19:54 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Ulrich Obergfell, kvm, glommer, zamsden, avi, mtosatti



On 12/14/10 12:49, Anthony Liguori wrote:
> But that doesn't tell you what the impact is in real world workloads. 
> Before we start pushing all device emulation into the kernel, we need to
> quantify how often gettimeofday() is really called in real workloads.

The workload that inspired that example program at its current max load
calls gtod upwards of 1000 times per second. The overhead of
gettimeofday was the biggest factor when comparing performance to bare
metal and esx. That's why I wrote the test program --- boils a complex
product/program to a single system call.

David

> 
> Regards,
> 
> Anthony Liguori
> 
>> What's the relative speed of the in-kernel pmtimer compared to the PIT?
>>
>> David
>>    
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 19:54         ` David S. Ahern
@ 2010-12-14 21:46           ` Anthony Liguori
  2010-12-14 23:59             ` David S. Ahern
  0 siblings, 1 reply; 19+ messages in thread
From: Anthony Liguori @ 2010-12-14 21:46 UTC (permalink / raw)
  To: David S. Ahern; +Cc: Ulrich Obergfell, kvm, glommer, zamsden, avi, mtosatti

On 12/14/2010 01:54 PM, David S. Ahern wrote:
>
> On 12/14/10 12:49, Anthony Liguori wrote:
>    
>> But that doesn't tell you what the impact is in real world workloads.
>> Before we start pushing all device emulation into the kernel, we need to
>> quantify how often gettimeofday() is really called in real workloads.
>>      
> The workload that inspired that example program at its current max load
> calls gtod upwards of 1000 times per second. The overhead of
> gettimeofday was the biggest factor when comparing performance to bare
> metal and esx. That's why I wrote the test program --- boils a complex
> product/program to a single system call.
>    

So the absolute performance impact was on the order of what?

The difference in CPU time of a light weight vs. heavy weight exit 
should be something like 2-3us.  That would mean 2-3ms of CPU time at a 
rate of 1000 per second.

That should be pretty much in the noise.

There are possibly second order effects that might make a large impact 
such as contention with the qemu_mutex.  It's worth doing 
experimentation to see if a non-mutex acquiring fast path in userspace 
also resulted in a significant performance boost.

Regards,

Anthony Liguori

> David
>
>    
>> Regards,
>>
>> Anthony Liguori
>>
>>      
>>> What's the relative speed of the in-kernel pmtimer compared to the PIT?
>>>
>>> David
>>>
>>>        
>>      


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 21:46           ` Anthony Liguori
@ 2010-12-14 23:59             ` David S. Ahern
  0 siblings, 0 replies; 19+ messages in thread
From: David S. Ahern @ 2010-12-14 23:59 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Ulrich Obergfell, kvm, glommer, zamsden, avi, mtosatti



On 12/14/10 14:46, Anthony Liguori wrote:
> On 12/14/2010 01:54 PM, David S. Ahern wrote:
>>
>> On 12/14/10 12:49, Anthony Liguori wrote:
>>   
>>> But that doesn't tell you what the impact is in real world workloads.
>>> Before we start pushing all device emulation into the kernel, we need to
>>> quantify how often gettimeofday() is really called in real workloads.
>>>      
>> The workload that inspired that example program at its current max load
>> calls gtod upwards of 1000 times per second. The overhead of
>> gettimeofday was the biggest factor when comparing performance to bare
>> metal and esx. That's why I wrote the test program --- boils a complex
>> product/program to a single system call.
>>    
> 
> So the absolute performance impact was on the order of what?

At the time I did the investigations (18-24 months ago) KVM was on the
order of 15-20% worse for a RHEL4 based workload and the overhead
appeared to be due to the PIT or PM timer as the clock source. Switching
the clock to the TSC brought the performance on par with bare metal, but
that route has other issues.

> 
> The difference in CPU time of a light weight vs. heavy weight exit
> should be something like 2-3us.  That would mean 2-3ms of CPU time at a
> rate of 1000 per second.

The PIT causes 3 VMEXITs for each gettimeofday (get_offset_pit in RHEL4):

	/* timer count may underflow right here */
	outb_p(0x00, PIT_MODE);	/* latch the count ASAP */
...
	count = inb_p(PIT_CH0);	/* read the latched count */
...
	count |= inb_p(PIT_CH0) << 8;
...


David


> 
> That should be pretty much in the noise.
> 
> There are possibly second order effects that might make a large impact
> such as contention with the qemu_mutex.  It's worth doing
> experimentation to see if a non-mutex acquiring fast path in userspace
> also resulted in a significant performance boost.
> 
> Regards,
> 
> Anthony Liguori
> 
>> David
>>
>>   
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>>>     
>>>> What's the relative speed of the in-kernel pmtimer compared to the PIT?
>>>>
>>>> David
>>>>
>>>>        
>>>      
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-14 16:04             ` Anthony Liguori
@ 2010-12-15  9:33               ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2010-12-15  9:33 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Glauber Costa, Ulrich Obergfell, kvm, zamsden, mtosatti

On 12/14/2010 06:04 PM, Anthony Liguori wrote:
> On 12/14/2010 09:38 AM, Avi Kivity wrote:
>> Fortunately, we have a very good bytecode interpreter that's 
>> accelerated in the kernel called KVM ;-)
>>
>> We have exactly the same bytecode interpreter under a different name, 
>> it's called userspace.
>>
>> If you can afford to make the transition back to the guest for 
>> emulation, you might as well transition to userspace.
>
> If you re-entered the guest and setup a stack that had the RIP of the 
> source of the exit, then there's no additional need to exit the 
> guest.  The handler can just do an iret.  Or am I missing something?

I didn't even consider an iret-to-guest, to be honest.  Let's consider 
the options:

  - iret-to-guest (a la tpr patching) - need to have an executable page 
in the guest virtual address space and some stack space (on 64-bit, can 
rely on iretq switching the stack).  That is probably impossible to do 
in a generic way without guest cooperation.  If we rely on guest 
cooperation, we might as well have the guest patch the IN instruction 
itself (no exits at all).

- architectural SMM - no need to find a virtual mapping, or even a 
physical page, since we're in our own physical address space.  However, 
the RSM instruction will trap, and on Intel, at least the first few 
instructions need to be emulated since SMM starts in big real mode.  
Also needs a tlb flush.

- kvm-specific SMM (probably what you referred to as "paravirt SMM", but 
if the guest OS is not involved, it's not really paravirt) - can switch 
to our own cr3 so no problem with finding a virtual mapping; however 
still needs a tlb flush, and on pre-NPT/EPT machines, switching cr3 back 
will involve an exit.

>>>
>>> We already have a virtual address space that works for most guests 
>>> thanks to the TPR optimization.
>>
>> It only works for Windows XP and Windows XP with the /3GB extension.
>
> Is this a fundamental limitation or just a statement of today's 
> heuristics?  Does any guest not keep the BIOS in virtual memory in a 
> static location?

If you're looking for a fundamental limitation, then yes, a guest need 
not map the BIOS at all.  Practically, I believe all common guest do map 
the BIOS, but IIRC modern guests use non-executable mappings.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
       [not found] <1956121317.795411292413874075.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
@ 2010-12-15 11:53 ` Ulrich Obergfell
  2012-02-21 18:10   ` Peter Lieven
  0 siblings, 1 reply; 19+ messages in thread
From: Ulrich Obergfell @ 2010-12-15 11:53 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, glommer, zamsden, avi, mtosatti

----- "Anthony Liguori" <anthony@codemonkey.ws> wrote:

> On 12/14/2010 06:09 AM, Ulrich Obergfell wrote:

[...]

> > Parts 1 thru 4 of this RFC contain experimental source code which
> > I recently used to investigate the performance benefit. In a Linux
> > guest, I was running a program that calls gettimeofday() 'n' times
> > in a loop (the PM Timer register is read during each call). With
> > in-kernel PM Timer, I observed a significant reduction of program
> > execution time.
> >    
> 
> I've played with this in the past.  Can you post real numbers, 
> preferably, with a real work load?

Anthony,

I only experimented with a gettimeofday() loop. With this test scenario
I observed that in-kernel PM Timer reduced the program execution time to
roughly half of the execution time that it takes with userspace PM Timer.
Please find some example results below (these results were obtained while
the host was not busy). The relative difference of in-kernel PM Timer
versus userspace PM Timer is high, whereas the absolute difference per
call appears to be low. So, the benefit much depends on how frequently
gettimeofday() is called in a real work load. I don't have any numbers
from a real work load. When I began working on this, I was motivated by
the fact that the Linux kernel itself provides an optimization for the
gettimeofday() call ('vxtime'). So, from this I presumed that there
would be real work loads which would benefit from the optimization of
the gettimeofday() call (otherwise, why would we have 'vxtime' ?).
Of course, 'vxtime' is not related to PM based time keeping. However,
the experimental code shows an approach to optimize gettimeofday() in
KVM virtual machines.

Regards,

Uli

- host:

# grep "model name" /proc/cpuinfo | sort | uniq -c
      8 model name : Intel(R) Core(TM) i7 CPU       Q 740  @ 1.73GHz

# uname -r
2.6.37-rc4

- guest:

# grep "model name" /proc/cpuinfo | sort | uniq -c
      4 model name : QEMU Virtual CPU version 0.13.50

- test program ('gtod.c'):

#include <sys/time.h>
#include <stdlib.h>

struct timeval tv;

main(int argc, char *argv[])
{
	int i = atoi(argv[1]);
	while (i-- > 0)
		gettimeofday(&tv, NULL);
}

- example results with in-kernel PM Timer:

# for i in 1 2 3
> do
> time ./gtod 25000000
> done

real	0m44.302s
user	0m1.090s
sys	0m43.163s

real	0m44.509s
user	0m1.100s
sys	0m43.393s

real	0m45.290s
user	0m1.160s
sys	0m44.123s

# for i in 10000000 50000000 100000000
> do
> time ./gtod $i
> done

real	0m17.981s
user	0m0.810s
sys	0m17.157s

real	1m27.253s
user	0m1.930s
sys	1m25.307s

real	2m51.801s
user	0m3.359s
sys	2m48.384s

- example results with userspace PM Timer:

# for i in 1 2 3
> do
> time ./gtod 25000000
> done

real	1m24.185s
user	0m2.000s
sys	1m22.168s

real	1m23.508s
user	0m1.750s
sys	1m21.738s

real	1m24.437s
user	0m1.900s
sys	1m22.517s

# for i in 10000000 50000000 100000000
> do
> time ./gtod $i
> done

real	0m33.479s
user	0m0.680s
sys	0m32.785s

real	2m50.831s
user	0m3.389s
sys	2m47.405s

real	5m42.304s
user	0m7.319s
sys	5m34.919s

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] KVM in-kernel PM Timer implementation
  2010-12-15 11:53 ` Ulrich Obergfell
@ 2012-02-21 18:10   ` Peter Lieven
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Lieven @ 2012-02-21 18:10 UTC (permalink / raw)
  To: Ulrich Obergfell; +Cc: Anthony Liguori, kvm, glommer, zamsden, avi, mtosatti

On 15.12.2010 12:53, Ulrich Obergfell wrote:
> ----- "Anthony Liguori"<anthony@codemonkey.ws>  wrote:
>
>> On 12/14/2010 06:09 AM, Ulrich Obergfell wrote:
> [...]
>
>>> Parts 1 thru 4 of this RFC contain experimental source code which
>>> I recently used to investigate the performance benefit. In a Linux
>>> guest, I was running a program that calls gettimeofday() 'n' times
>>> in a loop (the PM Timer register is read during each call). With
>>> in-kernel PM Timer, I observed a significant reduction of program
>>> execution time.
>>>
>> I've played with this in the past.  Can you post real numbers,
>> preferably, with a real work load?
>
> Anthony,
>
> I only experimented with a gettimeofday() loop. With this test scenario
> I observed that in-kernel PM Timer reduced the program execution time to
> roughly half of the execution time that it takes with userspace PM Timer.
> Please find some example results below (these results were obtained while
> the host was not busy). The relative difference of in-kernel PM Timer
> versus userspace PM Timer is high, whereas the absolute difference per
> call appears to be low. So, the benefit much depends on how frequently
> gettimeofday() is called in a real work load. I don't have any numbers
> from a real work load. When I began working on this, I was motivated by
> the fact that the Linux kernel itself provides an optimization for the
> gettimeofday() call ('vxtime'). So, from this I presumed that there
> would be real work loads which would benefit from the optimization of
> the gettimeofday() call (otherwise, why would we have 'vxtime' ?).
> Of course, 'vxtime' is not related to PM based time keeping. However,
> the experimental code shows an approach to optimize gettimeofday() in
> KVM virtual machines.
>
>
> Regards,
>
> Uli
>
>
> - host:
>
> # grep "model name" /proc/cpuinfo | sort | uniq -c
>        8 model name : Intel(R) Core(TM) i7 CPU       Q 740  @ 1.73GHz
>
> # uname -r
> 2.6.37-rc4
>
>
> - guest:
>
> # grep "model name" /proc/cpuinfo | sort | uniq -c
>        4 model name : QEMU Virtual CPU version 0.13.50
>
>
> - test program ('gtod.c'):
>
> #include<sys/time.h>
> #include<stdlib.h>
>
> struct timeval tv;
>
> main(int argc, char *argv[])
> {
> 	int i = atoi(argv[1]);
> 	while (i-->  0)
> 		gettimeofday(&tv, NULL);
> }
>
>
> - example results with in-kernel PM Timer:
>
> # for i in 1 2 3
>> do
>> time ./gtod 25000000
>> done
> real	0m44.302s
> user	0m1.090s
> sys	0m43.163s
>
> real	0m44.509s
> user	0m1.100s
> sys	0m43.393s
>
> real	0m45.290s
> user	0m1.160s
> sys	0m44.123s
>
> # for i in 10000000 50000000 100000000
>> do
>> time ./gtod $i
>> done
> real	0m17.981s
> user	0m0.810s
> sys	0m17.157s
>
> real	1m27.253s
> user	0m1.930s
> sys	1m25.307s
>
> real	2m51.801s
> user	0m3.359s
> sys	2m48.384s
>
>
> - example results with userspace PM Timer:
>
> # for i in 1 2 3
>> do
>> time ./gtod 25000000
>> done
> real	1m24.185s
> user	0m2.000s
> sys	1m22.168s
>
> real	1m23.508s
> user	0m1.750s
> sys	1m21.738s
>
> real	1m24.437s
> user	0m1.900s
> sys	1m22.517s
>
> # for i in 10000000 50000000 100000000
>> do
>> time ./gtod $i
>> done
> real	0m33.479s
> user	0m0.680s
> sys	0m32.785s
>
> real	2m50.831s
> user	0m3.389s
> sys	2m47.405s
>
> real	5m42.304s
> user	0m7.319s
> sys	5m34.919s
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

i currently analyze a performance regression togehter with Gleb where a 
Windows 7 / Win2008R2 VM hammers the pmtimer approx. 15000 times/s during
I/O. the performance thus is very bad and the cpu is at 100%.

has anyone made any further work on the in-kernel pm timer or a full 
implementation?

would it be possible to rebase this old experimental patch to see if it 
helps in the performance regression we came across?

thank you,
peter

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-02-21 18:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <953393305.700721292337871455.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
2010-12-14 14:44 ` [RFC 0/4] KVM in-kernel PM Timer implementation Ulrich Obergfell
2010-12-14 15:12   ` Avi Kivity
     [not found] <1956121317.795411292413874075.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
2010-12-15 11:53 ` Ulrich Obergfell
2012-02-21 18:10   ` Peter Lieven
     [not found] <344060531.680691292328457867.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
2010-12-14 12:09 ` Ulrich Obergfell
2010-12-14 13:34   ` Avi Kivity
2010-12-14 13:40     ` Glauber Costa
2010-12-14 13:49       ` Avi Kivity
2010-12-14 13:52         ` Gleb Natapov
2010-12-14 15:32         ` Anthony Liguori
2010-12-14 15:38           ` Avi Kivity
2010-12-14 16:04             ` Anthony Liguori
2010-12-15  9:33               ` Avi Kivity
2010-12-14 15:29   ` Anthony Liguori
2010-12-14 18:00     ` David S. Ahern
2010-12-14 19:49       ` Anthony Liguori
2010-12-14 19:54         ` David S. Ahern
2010-12-14 21:46           ` Anthony Liguori
2010-12-14 23:59             ` David S. Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox