* Re: Performance overhead of get_cycles_sync
@ 2007-12-11 14:27 ` Ingo Molnar
0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2007-12-11 14:27 UTC (permalink / raw)
To: dor.laor-atKUWr5tajBWk0Htik3J/w; +Cc: kvm-devel, Linux Kernel Mailing List
* Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Here [include/asm-x86/tsc.h]:
>
> /* Like get_cycles, but make sure the CPU is synchronized. */
> static __always_inline cycles_t get_cycles_sync(void)
> {
> unsigned long long ret;
> unsigned eax, edx;
>
> /*
> * Use RDTSCP if possible; it is guaranteed to be synchronous
> * and doesn't cause a VMEXIT on Hypervisors
> */
> alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
> ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> "a" (0U), "d" (0U) : "ecx", "memory");
> ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
> if (ret)
> return ret;
>
> /*
> * Don't do an additional sync on CPUs where we know
> * RDTSC is already synchronous:
> */
> // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
> rdtscll(ret);
The patch below should resolve this - could you please test and Ack it?
But this CPUID was present in v2.6.23 too, so why did it only show up in
2.6.24-rc for you?
Ingo
-------------->
Subject: x86: fix get_cycles_sync() overhead
From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
get_cycles_sync() is causing massive overhead in KVM networking:
http://lkml.org/lkml/2007/12/11/54
remove the explicit CPUID serialization - it causes VM exits and is
pointless: we care about GTOD coherency but that goes to user-space
via a syscall, and syscalls are serialization points anyway.
Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
---
include/asm-x86/tsc.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
Index: linux-x86.q/include/asm-x86/tsc.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/tsc.h
+++ linux-x86.q/include/asm-x86/tsc.h
@@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
unsigned eax, edx;
/*
- * Use RDTSCP if possible; it is guaranteed to be synchronous
- * and doesn't cause a VMEXIT on Hypervisors
+ * Use RDTSCP if possible; it is guaranteed to be synchronous
+ * and doesn't cause a VMEXIT on Hypervisors
*/
alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
ASM_OUTPUT2("=a" (eax), "=d" (edx)),
@@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
return ret;
/*
- * Don't do an additional sync on CPUs where we know
- * RDTSC is already synchronous:
+ * Use RDTSC on other CPUs. This might not be fully synchronous,
+ * but it's not a problem: the only coherency we care about is
+ * the GTOD output to user-space, and syscalls are synchronization
+ * points anyway:
*/
- alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
- "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
rdtscll(ret);
return ret;
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Performance overhead of get_cycles_sync
@ 2007-12-11 15:03 ` Dor Laor
0 siblings, 0 replies; 23+ messages in thread
From: Dor Laor @ 2007-12-11 15:03 UTC (permalink / raw)
To: Ingo Molnar; +Cc: tglx, Linux Kernel Mailing List, kvm-devel
Ingo Molnar wrote:
> * Dor Laor <dor.laor@gmail.com> wrote:
>
>
>> Here [include/asm-x86/tsc.h]:
>>
>> /* Like get_cycles, but make sure the CPU is synchronized. */
>> static __always_inline cycles_t get_cycles_sync(void)
>> {
>> unsigned long long ret;
>> unsigned eax, edx;
>>
>> /*
>> * Use RDTSCP if possible; it is guaranteed to be synchronous
>> * and doesn't cause a VMEXIT on Hypervisors
>> */
>> alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
>> ASM_OUTPUT2("=a" (eax), "=d" (edx)),
>> "a" (0U), "d" (0U) : "ecx", "memory");
>> ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
>> if (ret)
>> return ret;
>>
>> /*
>> * Don't do an additional sync on CPUs where we know
>> * RDTSC is already synchronous:
>> */
>> // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
>> // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
>> rdtscll(ret);
>>
>
> The patch below should resolve this - could you please test and Ack it?
>
It works, actually I already commented it out.
Acked-by: Dor Laor <dor.laor@qumranet.com>
But this CPUID was present in v2.6.23 too, so why did it only show up in
> 2.6.24-rc for you?
>
>
I tried to figure out but all the code movements for i386 go in the way.
In the previous email I reported to Andi that Fedora kernel 2.6.23-8 did
not suffer from it.
Thanks for the ultra fast reply :)
Dor
> Ingo
>
> -------------->
> Subject: x86: fix get_cycles_sync() overhead
> From: Ingo Molnar <mingo@elte.hu>
>
> get_cycles_sync() is causing massive overhead in KVM networking:
>
> http://lkml.org/lkml/2007/12/11/54
>
> remove the explicit CPUID serialization - it causes VM exits and is
> pointless: we care about GTOD coherency but that goes to user-space
> via a syscall, and syscalls are serialization points anyway.
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> include/asm-x86/tsc.h | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> Index: linux-x86.q/include/asm-x86/tsc.h
> ===================================================================
> --- linux-x86.q.orig/include/asm-x86/tsc.h
> +++ linux-x86.q/include/asm-x86/tsc.h
> @@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
> unsigned eax, edx;
>
> /*
> - * Use RDTSCP if possible; it is guaranteed to be synchronous
> - * and doesn't cause a VMEXIT on Hypervisors
> + * Use RDTSCP if possible; it is guaranteed to be synchronous
> + * and doesn't cause a VMEXIT on Hypervisors
> */
> alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
> ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> @@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
> return ret;
>
> /*
> - * Don't do an additional sync on CPUs where we know
> - * RDTSC is already synchronous:
> + * Use RDTSC on other CPUs. This might not be fully synchronous,
> + * but it's not a problem: the only coherency we care about is
> + * the GTOD output to user-space, and syscalls are synchronization
> + * points anyway:
> */
> - alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> - "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
> rdtscll(ret);
>
> return ret;
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Performance overhead of get_cycles_sync
@ 2007-12-11 15:03 ` Dor Laor
0 siblings, 0 replies; 23+ messages in thread
From: Dor Laor @ 2007-12-11 15:03 UTC (permalink / raw)
To: Ingo Molnar; +Cc: kvm-devel, Linux Kernel Mailing List
Ingo Molnar wrote:
> * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>
>> Here [include/asm-x86/tsc.h]:
>>
>> /* Like get_cycles, but make sure the CPU is synchronized. */
>> static __always_inline cycles_t get_cycles_sync(void)
>> {
>> unsigned long long ret;
>> unsigned eax, edx;
>>
>> /*
>> * Use RDTSCP if possible; it is guaranteed to be synchronous
>> * and doesn't cause a VMEXIT on Hypervisors
>> */
>> alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
>> ASM_OUTPUT2("=a" (eax), "=d" (edx)),
>> "a" (0U), "d" (0U) : "ecx", "memory");
>> ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
>> if (ret)
>> return ret;
>>
>> /*
>> * Don't do an additional sync on CPUs where we know
>> * RDTSC is already synchronous:
>> */
>> // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
>> // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
>> rdtscll(ret);
>>
>
> The patch below should resolve this - could you please test and Ack it?
>
It works, actually I already commented it out.
Acked-by: Dor Laor <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
But this CPUID was present in v2.6.23 too, so why did it only show up in
> 2.6.24-rc for you?
>
>
I tried to figure out but all the code movements for i386 go in the way.
In the previous email I reported to Andi that Fedora kernel 2.6.23-8 did
not suffer from it.
Thanks for the ultra fast reply :)
Dor
> Ingo
>
> -------------->
> Subject: x86: fix get_cycles_sync() overhead
> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
>
> get_cycles_sync() is causing massive overhead in KVM networking:
>
> http://lkml.org/lkml/2007/12/11/54
>
> remove the explicit CPUID serialization - it causes VM exits and is
> pointless: we care about GTOD coherency but that goes to user-space
> via a syscall, and syscalls are serialization points anyway.
>
> Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> ---
> include/asm-x86/tsc.h | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> Index: linux-x86.q/include/asm-x86/tsc.h
> ===================================================================
> --- linux-x86.q.orig/include/asm-x86/tsc.h
> +++ linux-x86.q/include/asm-x86/tsc.h
> @@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
> unsigned eax, edx;
>
> /*
> - * Use RDTSCP if possible; it is guaranteed to be synchronous
> - * and doesn't cause a VMEXIT on Hypervisors
> + * Use RDTSCP if possible; it is guaranteed to be synchronous
> + * and doesn't cause a VMEXIT on Hypervisors
> */
> alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
> ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> @@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
> return ret;
>
> /*
> - * Don't do an additional sync on CPUs where we know
> - * RDTSC is already synchronous:
> + * Use RDTSC on other CPUs. This might not be fully synchronous,
> + * but it's not a problem: the only coherency we care about is
> + * the GTOD output to user-space, and syscalls are synchronization
> + * points anyway:
> */
> - alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> - "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
> rdtscll(ret);
>
> return ret;
>
>
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance overhead of get_cycles_sync
2007-12-11 14:27 ` Ingo Molnar
(?)
(?)
@ 2007-12-11 16:35 ` Arjan van de Ven
2007-12-11 17:03 ` Ingo Molnar
-1 siblings, 1 reply; 23+ messages in thread
From: Arjan van de Ven @ 2007-12-11 16:35 UTC (permalink / raw)
To: Ingo Molnar; +Cc: dor.laor, tglx, Linux Kernel Mailing List, kvm-devel
On Tue, 11 Dec 2007 15:27:17 +0100
Ingo Molnar <mingo@elte.hu> wrote:
>
> * Dor Laor <dor.laor@gmail.com> wrote:
>
>
> The patch below should resolve this - could you please test and Ack
> it? But this CPUID was present in v2.6.23 too, so why did it only
> show up in 2.6.24-rc for you?
isn't this probably wrong since this code is also used in the vsyscall code..
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance overhead of get_cycles_sync
@ 2007-12-11 17:03 ` Ingo Molnar
0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2007-12-11 17:03 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: dor.laor, tglx, Linux Kernel Mailing List, kvm-devel
* Arjan van de Ven <arjan@infradead.org> wrote:
> > * Dor Laor <dor.laor@gmail.com> wrote:
> >
> > The patch below should resolve this - could you please test and Ack
> > it? But this CPUID was present in v2.6.23 too, so why did it only
> > show up in 2.6.24-rc for you?
>
> isn't this probably wrong since this code is also used in the vsyscall
> code..
the TSC clocksource (and hence the vsyscall code) is turned off on
systems that fail the TOD/CLOCK portion of this test:
http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
i.e. on the majority of systems in place.
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance overhead of get_cycles_sync
@ 2007-12-11 17:03 ` Ingo Molnar
0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2007-12-11 17:03 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: kvm-devel, Linux Kernel Mailing List
* Arjan van de Ven <arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> > * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> > The patch below should resolve this - could you please test and Ack
> > it? But this CPUID was present in v2.6.23 too, so why did it only
> > show up in 2.6.24-rc for you?
>
> isn't this probably wrong since this code is also used in the vsyscall
> code..
the TSC clocksource (and hence the vsyscall code) is turned off on
systems that fail the TOD/CLOCK portion of this test:
http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
i.e. on the majority of systems in place.
Ingo
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance overhead of get_cycles_sync
2007-12-11 17:03 ` Ingo Molnar
(?)
@ 2007-12-11 17:23 ` Andi Kleen
2007-12-11 20:19 ` Ingo Molnar
-1 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2007-12-11 17:23 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel
Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> writes:
> the TSC clocksource (and hence the vsyscall code) is turned off on
> systems that fail the TOD/CLOCK portion of this test:
Which is not on core2 which was the question about. And if it was
turned off it wouldn't use get_cycles_sync() at all.
-Andi
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance overhead of get_cycles_sync
2007-12-11 17:23 ` Andi Kleen
@ 2007-12-11 20:19 ` Ingo Molnar
2007-12-11 20:29 ` Ingo Molnar
0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2007-12-11 20:19 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
* Andi Kleen <ak@suse.de> wrote:
> Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> writes:
>
> > the TSC clocksource (and hence the vsyscall code) is turned off on
> > systems that fail the TOD/CLOCK portion of this test:
>
> Which is not on core2 which was the question about. And if it was
> turned off it wouldn't use get_cycles_sync() at all.
it is turned off on core2 too:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance overhead of get_cycles_sync
@ 2007-12-11 20:29 ` Ingo Molnar
0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2007-12-11 20:29 UTC (permalink / raw)
To: Andi Kleen
Cc: linux-kernel, Thomas Gleixner, Arjan van de Ven, dor.laor,
kvm-devel
* Ingo Molnar <mingo@elte.hu> wrote:
> > Which is not on core2 which was the question about. And if it was
> > turned off it wouldn't use get_cycles_sync() at all.
>
> it is turned off on core2 too:
>
> # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> acpi_pm
but you are right that it's not turned off on all core2's, so my patch
is wrong for them.
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance overhead of get_cycles_sync
@ 2007-12-11 20:29 ` Ingo Molnar
0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2007-12-11 20:29 UTC (permalink / raw)
To: Andi Kleen; +Cc: kvm-devel, linux-kernel-u79uwXL29TY76Z2rM5mHXA
* Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
> > Which is not on core2 which was the question about. And if it was
> > turned off it wouldn't use get_cycles_sync() at all.
>
> it is turned off on core2 too:
>
> # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> acpi_pm
but you are right that it's not turned off on all core2's, so my patch
is wrong for them.
Ingo
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [kvm-devel] Performance overhead of get_cycles_sync
@ 2007-12-11 21:26 ` Joerg Roedel
0 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2007-12-11 21:26 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andi Kleen, dor.laor, kvm-devel, Linux Kernel Mailing List
On Tue, Dec 11, 2007 at 03:27:17PM +0100, Ingo Molnar wrote:
>
> * Dor Laor <dor.laor@gmail.com> wrote:
>
> > Here [include/asm-x86/tsc.h]:
> >
> > /* Like get_cycles, but make sure the CPU is synchronized. */
> > static __always_inline cycles_t get_cycles_sync(void)
> > {
> > unsigned long long ret;
> > unsigned eax, edx;
> >
> > /*
> > * Use RDTSCP if possible; it is guaranteed to be synchronous
> > * and doesn't cause a VMEXIT on Hypervisors
> > */
> > alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
> > ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> > "a" (0U), "d" (0U) : "ecx", "memory");
> > ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
> > if (ret)
> > return ret;
> >
> > /*
> > * Don't do an additional sync on CPUs where we know
> > * RDTSC is already synchronous:
> > */
> > // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> > // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
> > rdtscll(ret);
>
> The patch below should resolve this - could you please test and Ack it?
> But this CPUID was present in v2.6.23 too, so why did it only show up in
> 2.6.24-rc for you?
>
> Ingo
>
> -------------->
> Subject: x86: fix get_cycles_sync() overhead
> From: Ingo Molnar <mingo@elte.hu>
>
> get_cycles_sync() is causing massive overhead in KVM networking:
>
> http://lkml.org/lkml/2007/12/11/54
>
> remove the explicit CPUID serialization - it causes VM exits and is
> pointless: we care about GTOD coherency but that goes to user-space
> via a syscall, and syscalls are serialization points anyway.
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> include/asm-x86/tsc.h | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> Index: linux-x86.q/include/asm-x86/tsc.h
> ===================================================================
> --- linux-x86.q.orig/include/asm-x86/tsc.h
> +++ linux-x86.q/include/asm-x86/tsc.h
> @@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
> unsigned eax, edx;
>
> /*
> - * Use RDTSCP if possible; it is guaranteed to be synchronous
> - * and doesn't cause a VMEXIT on Hypervisors
> + * Use RDTSCP if possible; it is guaranteed to be synchronous
> + * and doesn't cause a VMEXIT on Hypervisors
> */
> alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
> ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> @@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
> return ret;
>
> /*
> - * Don't do an additional sync on CPUs where we know
> - * RDTSC is already synchronous:
> + * Use RDTSC on other CPUs. This might not be fully synchronous,
> + * but it's not a problem: the only coherency we care about is
> + * the GTOD output to user-space, and syscalls are synchronization
> + * points anyway:
> */
> - alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> - "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
> rdtscll(ret);
>
> return ret;
I don't think this is a good idea. I discussed exactly this item with
Andi Kleen a while ago and afair the serializing instruction was
necessary to fix a backwards walking gettimeofday() on some K8
revisions. Andi Kleen can tell more details, I added him to the CC list.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Performance overhead of get_cycles_sync
@ 2007-12-11 21:26 ` Joerg Roedel
0 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2007-12-11 21:26 UTC (permalink / raw)
To: Ingo Molnar; +Cc: kvm-devel, Andi Kleen, Linux Kernel Mailing List
On Tue, Dec 11, 2007 at 03:27:17PM +0100, Ingo Molnar wrote:
>
> * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> > Here [include/asm-x86/tsc.h]:
> >
> > /* Like get_cycles, but make sure the CPU is synchronized. */
> > static __always_inline cycles_t get_cycles_sync(void)
> > {
> > unsigned long long ret;
> > unsigned eax, edx;
> >
> > /*
> > * Use RDTSCP if possible; it is guaranteed to be synchronous
> > * and doesn't cause a VMEXIT on Hypervisors
> > */
> > alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
> > ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> > "a" (0U), "d" (0U) : "ecx", "memory");
> > ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
> > if (ret)
> > return ret;
> >
> > /*
> > * Don't do an additional sync on CPUs where we know
> > * RDTSC is already synchronous:
> > */
> > // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> > // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
> > rdtscll(ret);
>
> The patch below should resolve this - could you please test and Ack it?
> But this CPUID was present in v2.6.23 too, so why did it only show up in
> 2.6.24-rc for you?
>
> Ingo
>
> -------------->
> Subject: x86: fix get_cycles_sync() overhead
> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
>
> get_cycles_sync() is causing massive overhead in KVM networking:
>
> http://lkml.org/lkml/2007/12/11/54
>
> remove the explicit CPUID serialization - it causes VM exits and is
> pointless: we care about GTOD coherency but that goes to user-space
> via a syscall, and syscalls are serialization points anyway.
>
> Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> ---
> include/asm-x86/tsc.h | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> Index: linux-x86.q/include/asm-x86/tsc.h
> ===================================================================
> --- linux-x86.q.orig/include/asm-x86/tsc.h
> +++ linux-x86.q/include/asm-x86/tsc.h
> @@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
> unsigned eax, edx;
>
> /*
> - * Use RDTSCP if possible; it is guaranteed to be synchronous
> - * and doesn't cause a VMEXIT on Hypervisors
> + * Use RDTSCP if possible; it is guaranteed to be synchronous
> + * and doesn't cause a VMEXIT on Hypervisors
> */
> alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
> ASM_OUTPUT2("=a" (eax), "=d" (edx)),
> @@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
> return ret;
>
> /*
> - * Don't do an additional sync on CPUs where we know
> - * RDTSC is already synchronous:
> + * Use RDTSC on other CPUs. This might not be fully synchronous,
> + * but it's not a problem: the only coherency we care about is
> + * the GTOD output to user-space, and syscalls are synchronization
> + * points anyway:
> */
> - alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
> - "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
> rdtscll(ret);
>
> return ret;
I don't think this is a good idea. I discussed exactly this item with
Andi Kleen a while ago and afair the serializing instruction was
necessary to fix a backwards walking gettimeofday() on some K8
revisions. Andi Kleen can tell more details, I added him to the CC list.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
^ permalink raw reply [flat|nested] 23+ messages in thread[parent not found: <20071211212628.GB6537-5C7GfCeVMHo@public.gmane.org>]
* Re: Performance overhead of get_cycles_sync
[not found] ` <20071211212628.GB6537-5C7GfCeVMHo@public.gmane.org>
@ 2007-12-12 0:19 ` Dor Laor
0 siblings, 0 replies; 23+ messages in thread
From: Dor Laor @ 2007-12-12 0:19 UTC (permalink / raw)
To: Joerg Roedel; +Cc: kvm-devel, Andi Kleen, Linux Kernel Mailing List
[-- Attachment #1.1: Type: text/plain, Size: 1068 bytes --]
>> - * Don't do an additional sync on CPUs where we know
>> - * RDTSC is already synchronous:
>> + * Use RDTSC on other CPUs. This might not be fully synchronous,
>> + * but it's not a problem: the only coherency we care about is
>> + * the GTOD output to user-space, and syscalls are synchronization
>> + * points anyway:
>> */
>> - alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
>> - "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
>> rdtscll(ret);
>>
>> return ret;
>>
>
> I don't think this is a good idea. I discussed exactly this item with
> Andi Kleen a while ago and afair the serializing instruction was
> necessary to fix a backwards walking gettimeofday() on some K8
> revisions. Andi Kleen can tell more details, I added him to the CC list.
>
> Joerg
>
>
So I suggest we'll wait for Andi Kleen's 24' patch using [l|m]fence.
Meanwhile one shouldn't use tsc clock source in guests or alternatively
mascaraed the guest cpu as
old Intel version [one can do that with kvm, although it hurt performance].
Regards,
Dor
[-- Attachment #1.2: Type: text/html, Size: 1453 bytes --]
[-- Attachment #2: Type: text/plain, Size: 277 bytes --]
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
[-- Attachment #3: Type: text/plain, Size: 186 bytes --]
_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel
^ permalink raw reply [flat|nested] 23+ messages in thread