public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Performance overhead of get_cycles_sync
@ 2007-12-11 13:11 Dor Laor
  2007-12-11 13:37 ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Dor Laor @ 2007-12-11 13:11 UTC (permalink / raw)
  To: mingo-X9Un+BFzKDI, tglx-hfZtesqFncYOwBW4kG4KsQ
  Cc: kvm-devel, Linux Kernel Mailing List

Hi Ingo, Thomas,

In the latest kernel (2.6.24-rc3) I noticed a drastic performance 
decrease for KVM networking.
The reason is many vmexit (exit reason is cpuid instruction) caused by 
calls to gettimeofday that uses tsc sourceclock.
read_tsc calls get_cycles_sync which might call cpuid in order to 
serialize the cpu.

Can you explain why the cpu needs to be serialized for every gettime call?
Do we need to be that accurate? (It will also slightly improve physical 
hosts).
I believe you have a reason and the answer is yes. In that case can you 
replace the serializing instruction
with an instruction that does not trigger vmexit? Maybe use 'ltr' for 
example?

Regards,
Dor.

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
  2007-12-11 13:11 Performance overhead of get_cycles_sync Dor Laor
@ 2007-12-11 13:37 ` Ingo Molnar
       [not found]   ` <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2007-12-11 13:37 UTC (permalink / raw)
  To: dor.laor; +Cc: tglx, Linux Kernel Mailing List, kvm-devel


* Dor Laor <dor.laor@gmail.com> wrote:

> Hi Ingo, Thomas,
>
> In the latest kernel (2.6.24-rc3) I noticed a drastic performance 
> decrease for KVM networking. The reason is many vmexit (exit reason is 
> cpuid instruction) caused by calls to gettimeofday that uses tsc 
> sourceclock. read_tsc calls get_cycles_sync which might call cpuid in 
> order to serialize the cpu.
>
> Can you explain why the cpu needs to be serialized for every gettime 
> call? Do we need to be that accurate? (It will also slightly improve 
> physical hosts). I believe you have a reason and the answer is yes. In 
> that case can you replace the serializing instruction with an 
> instruction that does not trigger vmexit? Maybe use 'ltr' for example?

hm, where exactly does it call CPUID?

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
       [not found]   ` <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org>
@ 2007-12-11 14:11     ` Dor Laor
       [not found]       ` <475E9A92.4030001-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-12-11 14:14     ` Dor Laor
  1 sibling, 1 reply; 11+ messages in thread
From: Dor Laor @ 2007-12-11 14:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel, Linux Kernel Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 1764 bytes --]

Ingo Molnar wrote:
>
> * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> > Hi Ingo, Thomas,
> >
> > In the latest kernel (2.6.24-rc3) I noticed a drastic performance
> > decrease for KVM networking. The reason is many vmexit (exit reason is
> > cpuid instruction) caused by calls to gettimeofday that uses tsc
> > sourceclock. read_tsc calls get_cycles_sync which might call cpuid in
> > order to serialize the cpu.
> >
> > Can you explain why the cpu needs to be serialized for every gettime
> > call? Do we need to be that accurate? (It will also slightly improve
> > physical hosts). I believe you have a reason and the answer is yes. In
> > that case can you replace the serializing instruction with an
> > instruction that does not trigger vmexit? Maybe use 'ltr' for example?
>
> hm, where exactly does it call CPUID?
>
>         Ingo
>
Here [include/asm-x86/tsc.h]:

/* Like get_cycles, but make sure the CPU is synchronized. */
static __always_inline cycles_t get_cycles_sync(void)
{
    unsigned long long ret;
    unsigned eax, edx;

    /*
       * Use RDTSCP if possible; it is guaranteed to be synchronous
      * and doesn't cause a VMEXIT on Hypervisors
     */
    alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
               ASM_OUTPUT2("=a" (eax), "=d" (edx)),
               "a" (0U), "d" (0U) : "ecx", "memory");
    ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
    if (ret)
        return ret;

    /*
     * Don't do an additional sync on CPUs where we know
     * RDTSC is already synchronous:
     */
//    alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
//              "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
    rdtscll(ret);

    return ret;
}


[-- Attachment #1.2: Type: text/html, Size: 3190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 277 bytes --]

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

[-- Attachment #3: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
       [not found]   ` <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org>
  2007-12-11 14:11     ` Dor Laor
@ 2007-12-11 14:14     ` Dor Laor
  1 sibling, 0 replies; 11+ messages in thread
From: Dor Laor @ 2007-12-11 14:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel, Linux Kernel Mailing List

Ingo Molnar wrote:
>
> * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> > Hi Ingo, Thomas,
> >
> > In the latest kernel (2.6.24-rc3) I noticed a drastic performance
> > decrease for KVM networking. The reason is many vmexit (exit reason is
> > cpuid instruction) caused by calls to gettimeofday that uses tsc
> > sourceclock. read_tsc calls get_cycles_sync which might call cpuid in
> > order to serialize the cpu.
> >
> > Can you explain why the cpu needs to be serialized for every gettime
> > call? Do we need to be that accurate? (It will also slightly improve
> > physical hosts). I believe you have a reason and the answer is yes. In
> > that case can you replace the serializing instruction with an
> > instruction that does not trigger vmexit? Maybe use 'ltr' for example?
>
> hm, where exactly does it call CPUID?
>
>         Ingo
>
Here, commented out [include/asm-x86/tsc.h]:
/* Like get_cycles, but make sure the CPU is synchronized. */
static __always_inline cycles_t get_cycles_sync(void)
{
    unsigned long long ret;
    unsigned eax, edx;

    /*
       * Use RDTSCP if possible; it is guaranteed to be synchronous
      * and doesn't cause a VMEXIT on Hypervisors
     */
    alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
               ASM_OUTPUT2("=a" (eax), "=d" (edx)),
               "a" (0U), "d" (0U) : "ecx", "memory");
    ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
    if (ret)
        return ret;

    /*
     * Don't do an additional sync on CPUs where we know
     * RDTSC is already synchronous:
     */
//    alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
//              "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
    rdtscll(ret);

    return ret;
}


-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
       [not found]       ` <475E9A92.4030001-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-12-11 14:27         ` Ingo Molnar
       [not found]           ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org>
  2007-12-11 16:35           ` Arjan van de Ven
  0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2007-12-11 14:27 UTC (permalink / raw)
  To: dor.laor-atKUWr5tajBWk0Htik3J/w; +Cc: kvm-devel, Linux Kernel Mailing List


* Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Here [include/asm-x86/tsc.h]:
>
> /* Like get_cycles, but make sure the CPU is synchronized. */
> static __always_inline cycles_t get_cycles_sync(void)
> {
>    unsigned long long ret;
>    unsigned eax, edx;
>
>    /*
>       * Use RDTSCP if possible; it is guaranteed to be synchronous
>      * and doesn't cause a VMEXIT on Hypervisors
>     */
>    alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
>               ASM_OUTPUT2("=a" (eax), "=d" (edx)),
>               "a" (0U), "d" (0U) : "ecx", "memory");
>    ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
>    if (ret)
>        return ret;
>
>    /*
>     * Don't do an additional sync on CPUs where we know
>     * RDTSC is already synchronous:
>     */
> //    alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> //              "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
>    rdtscll(ret);

The patch below should resolve this - could you please test and Ack it? 
But this CPUID was present in v2.6.23 too, so why did it only show up in 
2.6.24-rc for you?

	Ingo

-------------->
Subject: x86: fix get_cycles_sync() overhead
From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>

get_cycles_sync() is causing massive overhead in KVM networking:

   http://lkml.org/lkml/2007/12/11/54

remove the explicit CPUID serialization - it causes VM exits and is
pointless: we care about GTOD coherency but that goes to user-space
via a syscall, and syscalls are serialization points anyway.

Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
---
 include/asm-x86/tsc.h |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-x86.q/include/asm-x86/tsc.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/tsc.h
+++ linux-x86.q/include/asm-x86/tsc.h
@@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
 	unsigned eax, edx;
 
 	/*
-  	 * Use RDTSCP if possible; it is guaranteed to be synchronous
- 	 * and doesn't cause a VMEXIT on Hypervisors
+	 * Use RDTSCP if possible; it is guaranteed to be synchronous
+	 * and doesn't cause a VMEXIT on Hypervisors
 	 */
 	alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
 		       ASM_OUTPUT2("=a" (eax), "=d" (edx)),
@@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
 		return ret;
 
 	/*
-	 * Don't do an additional sync on CPUs where we know
-	 * RDTSC is already synchronous:
+	 * Use RDTSC on other CPUs. This might not be fully synchronous,
+	 * but it's not a problem: the only coherency we care about is
+	 * the GTOD output to user-space, and syscalls are synchronization
+	 * points anyway:
 	 */
-	alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
-			  "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
 	rdtscll(ret);
 
 	return ret;

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
       [not found]           ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org>
@ 2007-12-11 15:03             ` Dor Laor
  2007-12-11 21:26             ` Joerg Roedel
  1 sibling, 0 replies; 11+ messages in thread
From: Dor Laor @ 2007-12-11 15:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel, Linux Kernel Mailing List

Ingo Molnar wrote:
> * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>   
>> Here [include/asm-x86/tsc.h]:
>>
>> /* Like get_cycles, but make sure the CPU is synchronized. */
>> static __always_inline cycles_t get_cycles_sync(void)
>> {
>>    unsigned long long ret;
>>    unsigned eax, edx;
>>
>>    /*
>>       * Use RDTSCP if possible; it is guaranteed to be synchronous
>>      * and doesn't cause a VMEXIT on Hypervisors
>>     */
>>    alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
>>               ASM_OUTPUT2("=a" (eax), "=d" (edx)),
>>               "a" (0U), "d" (0U) : "ecx", "memory");
>>    ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
>>    if (ret)
>>        return ret;
>>
>>    /*
>>     * Don't do an additional sync on CPUs where we know
>>     * RDTSC is already synchronous:
>>     */
>> //    alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
>> //              "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
>>    rdtscll(ret);
>>     
>
> The patch below should resolve this - could you please test and Ack it? 
>   
It works, actually I already commented it out.

Acked-by: Dor Laor <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org>

But this CPUID was present in v2.6.23 too, so why did it only show up in
> 2.6.24-rc for you?
>
>   
I tried to figure out but all the code movements for i386 go in the way.
In the previous email I reported to Andi that Fedora kernel 2.6.23-8 did 
not suffer from it.
Thanks for the ultra fast reply :)
Dor
> 	Ingo
>
> -------------->
> Subject: x86: fix get_cycles_sync() overhead
> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
>
> get_cycles_sync() is causing massive overhead in KVM networking:
>
>    http://lkml.org/lkml/2007/12/11/54
>
> remove the explicit CPUID serialization - it causes VM exits and is
> pointless: we care about GTOD coherency but that goes to user-space
> via a syscall, and syscalls are serialization points anyway.
>
> Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> ---
>  include/asm-x86/tsc.h |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> Index: linux-x86.q/include/asm-x86/tsc.h
> ===================================================================
> --- linux-x86.q.orig/include/asm-x86/tsc.h
> +++ linux-x86.q/include/asm-x86/tsc.h
> @@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
>  	unsigned eax, edx;
>  
>  	/*
> -  	 * Use RDTSCP if possible; it is guaranteed to be synchronous
> - 	 * and doesn't cause a VMEXIT on Hypervisors
> +	 * Use RDTSCP if possible; it is guaranteed to be synchronous
> +	 * and doesn't cause a VMEXIT on Hypervisors
>  	 */
>  	alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
>  		       ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> @@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
>  		return ret;
>  
>  	/*
> -	 * Don't do an additional sync on CPUs where we know
> -	 * RDTSC is already synchronous:
> +	 * Use RDTSC on other CPUs. This might not be fully synchronous,
> +	 * but it's not a problem: the only coherency we care about is
> +	 * the GTOD output to user-space, and syscalls are synchronization
> +	 * points anyway:
>  	 */
> -	alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> -			  "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
>  	rdtscll(ret);
>  
>  	return ret;
>
>   


-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
  2007-12-11 14:27         ` Ingo Molnar
       [not found]           ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org>
@ 2007-12-11 16:35           ` Arjan van de Ven
       [not found]             ` <20071211083513.56c2a385-NIQFrBLA1CpScpXdPBN83iCwEArCW2h5@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Arjan van de Ven @ 2007-12-11 16:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: dor.laor, tglx, Linux Kernel Mailing List, kvm-devel

On Tue, 11 Dec 2007 15:27:17 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Dor Laor <dor.laor@gmail.com> wrote:
> 
> 
> The patch below should resolve this - could you please test and Ack
> it? But this CPUID was present in v2.6.23 too, so why did it only
> show up in 2.6.24-rc for you?

isn't this probably wrong since this code is also used in the vsyscall code..

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
       [not found]             ` <20071211083513.56c2a385-NIQFrBLA1CpScpXdPBN83iCwEArCW2h5@public.gmane.org>
@ 2007-12-11 17:03               ` Ingo Molnar
       [not found]                 ` <p73abohno0u.fsf@bingen.suse.de>
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2007-12-11 17:03 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: kvm-devel, Linux Kernel Mailing List


* Arjan van de Ven <arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:

> > * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > 
> > The patch below should resolve this - could you please test and Ack 
> > it? But this CPUID was present in v2.6.23 too, so why did it only 
> > show up in 2.6.24-rc for you?
> 
> isn't this probably wrong since this code is also used in the vsyscall 
> code..

the TSC clocksource (and hence the vsyscall code) is turned off on 
systems that fail the TOD/CLOCK portion of this test:

  http://people.redhat.com/mingo/time-warp-test/time-warp-test.c

i.e. on the majority of systems in place.

	Ingo

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
       [not found]                     ` <20071211201930.GB22397-X9Un+BFzKDI@public.gmane.org>
@ 2007-12-11 20:29                       ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2007-12-11 20:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: kvm-devel, linux-kernel-u79uwXL29TY76Z2rM5mHXA


* Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:

> > Which is not on core2 which was the question about. And if it was 
> > turned off it wouldn't use get_cycles_sync() at all.
> 
> it is turned off on core2 too:
> 
>  # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
>  acpi_pm

but you are right that it's not turned off on all core2's, so my patch 
is wrong for them.

	Ingo

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
       [not found]           ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org>
  2007-12-11 15:03             ` Dor Laor
@ 2007-12-11 21:26             ` Joerg Roedel
       [not found]               ` <20071211212628.GB6537-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Joerg Roedel @ 2007-12-11 21:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel, Andi Kleen, Linux Kernel Mailing List

On Tue, Dec 11, 2007 at 03:27:17PM +0100, Ingo Molnar wrote:
> 
> * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> > Here [include/asm-x86/tsc.h]:
> >
> > /* Like get_cycles, but make sure the CPU is synchronized. */
> > static __always_inline cycles_t get_cycles_sync(void)
> > {
> >    unsigned long long ret;
> >    unsigned eax, edx;
> >
> >    /*
> >       * Use RDTSCP if possible; it is guaranteed to be synchronous
> >      * and doesn't cause a VMEXIT on Hypervisors
> >     */
> >    alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
> >               ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> >               "a" (0U), "d" (0U) : "ecx", "memory");
> >    ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
> >    if (ret)
> >        return ret;
> >
> >    /*
> >     * Don't do an additional sync on CPUs where we know
> >     * RDTSC is already synchronous:
> >     */
> > //    alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> > //              "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
> >    rdtscll(ret);
> 
> The patch below should resolve this - could you please test and Ack it? 
> But this CPUID was present in v2.6.23 too, so why did it only show up in 
> 2.6.24-rc for you?
> 
> 	Ingo
> 
> -------------->
> Subject: x86: fix get_cycles_sync() overhead
> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> 
> get_cycles_sync() is causing massive overhead in KVM networking:
> 
>    http://lkml.org/lkml/2007/12/11/54
> 
> remove the explicit CPUID serialization - it causes VM exits and is
> pointless: we care about GTOD coherency but that goes to user-space
> via a syscall, and syscalls are serialization points anyway.
> 
> Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> ---
>  include/asm-x86/tsc.h |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> Index: linux-x86.q/include/asm-x86/tsc.h
> ===================================================================
> --- linux-x86.q.orig/include/asm-x86/tsc.h
> +++ linux-x86.q/include/asm-x86/tsc.h
> @@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
>  	unsigned eax, edx;
>  
>  	/*
> -  	 * Use RDTSCP if possible; it is guaranteed to be synchronous
> - 	 * and doesn't cause a VMEXIT on Hypervisors
> +	 * Use RDTSCP if possible; it is guaranteed to be synchronous
> +	 * and doesn't cause a VMEXIT on Hypervisors
>  	 */
>  	alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
>  		       ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> @@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
>  		return ret;
>  
>  	/*
> -	 * Don't do an additional sync on CPUs where we know
> -	 * RDTSC is already synchronous:
> +	 * Use RDTSC on other CPUs. This might not be fully synchronous,
> +	 * but it's not a problem: the only coherency we care about is
> +	 * the GTOD output to user-space, and syscalls are synchronization
> +	 * points anyway:
>  	 */
> -	alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> -			  "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
>  	rdtscll(ret);
>  
>  	return ret;

I don't think this is a good idea. I discussed exactly this item with
Andi Kleen a while ago and afair the serializing instruction was
necessary to fix a backwards walking gettimeofday() on some K8
revisions. Andi Kleen can tell more details, I added him to the CC list.

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy



-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance overhead of get_cycles_sync
       [not found]               ` <20071211212628.GB6537-5C7GfCeVMHo@public.gmane.org>
@ 2007-12-12  0:19                 ` Dor Laor
  0 siblings, 0 replies; 11+ messages in thread
From: Dor Laor @ 2007-12-12  0:19 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: kvm-devel, Andi Kleen, Linux Kernel Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 1068 bytes --]


>> -	 * Don't do an additional sync on CPUs where we know
>> -	 * RDTSC is already synchronous:
>> +	 * Use RDTSC on other CPUs. This might not be fully synchronous,
>> +	 * but it's not a problem: the only coherency we care about is
>> +	 * the GTOD output to user-space, and syscalls are synchronization
>> +	 * points anyway:
>>  	 */
>> -	alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
>> -			  "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
>>  	rdtscll(ret);
>>  
>>  	return ret;
>>     
>
> I don't think this is a good idea. I discussed exactly this item with
> Andi Kleen a while ago and afair the serializing instruction was
> necessary to fix a backwards walking gettimeofday() on some K8
> revisions. Andi Kleen can tell more details, I added him to the CC list.
>
> Joerg
>
>   
So I suggest we'll wait for Andi Kleen's 24' patch using [l|m]fence.
Meanwhile one shouldn't use tsc clock source in guests or alternatively 
mascaraed the guest cpu as
old Intel version [one can do that with kvm, although it hurt performance].
Regards,
Dor


[-- Attachment #1.2: Type: text/html, Size: 1453 bytes --]

[-- Attachment #2: Type: text/plain, Size: 277 bytes --]

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

[-- Attachment #3: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-12-12  0:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-11 13:11 Performance overhead of get_cycles_sync Dor Laor
2007-12-11 13:37 ` Ingo Molnar
     [not found]   ` <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org>
2007-12-11 14:11     ` Dor Laor
     [not found]       ` <475E9A92.4030001-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-11 14:27         ` Ingo Molnar
     [not found]           ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org>
2007-12-11 15:03             ` Dor Laor
2007-12-11 21:26             ` Joerg Roedel
     [not found]               ` <20071211212628.GB6537-5C7GfCeVMHo@public.gmane.org>
2007-12-12  0:19                 ` Dor Laor
2007-12-11 16:35           ` Arjan van de Ven
     [not found]             ` <20071211083513.56c2a385-NIQFrBLA1CpScpXdPBN83iCwEArCW2h5@public.gmane.org>
2007-12-11 17:03               ` Ingo Molnar
     [not found]                 ` <p73abohno0u.fsf@bingen.suse.de>
     [not found]                   ` <20071211201930.GB22397@elte.hu>
     [not found]                     ` <20071211201930.GB22397-X9Un+BFzKDI@public.gmane.org>
2007-12-11 20:29                       ` Ingo Molnar
2007-12-11 14:14     ` Dor Laor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox