* Performance overhead of get_cycles_sync @ 2007-12-11 13:11 Dor Laor 2007-12-11 13:37 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: Dor Laor @ 2007-12-11 13:11 UTC (permalink / raw) To: mingo-X9Un+BFzKDI, tglx-hfZtesqFncYOwBW4kG4KsQ Cc: kvm-devel, Linux Kernel Mailing List Hi Ingo, Thomas, In the latest kernel (2.6.24-rc3) I noticed a drastic performance decrease for KVM networking. The reason is many vmexit (exit reason is cpuid instruction) caused by calls to gettimeofday that uses tsc sourceclock. read_tsc calls get_cycles_sync which might call cpuid in order to serialize the cpu. Can you explain why the cpu needs to be serialized for every gettime call? Do we need to be that accurate? (It will also slightly improve physical hosts). I believe you have a reason and the answer is yes. In that case can you replace the serializing instruction with an instruction that does not trigger vmexit? Maybe use 'ltr' for example? Regards, Dor. ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance overhead of get_cycles_sync 2007-12-11 13:11 Performance overhead of get_cycles_sync Dor Laor @ 2007-12-11 13:37 ` Ingo Molnar [not found] ` <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2007-12-11 13:37 UTC (permalink / raw) To: dor.laor; +Cc: tglx, Linux Kernel Mailing List, kvm-devel * Dor Laor <dor.laor@gmail.com> wrote: > Hi Ingo, Thomas, > > In the latest kernel (2.6.24-rc3) I noticed a drastic performance > decrease for KVM networking. The reason is many vmexit (exit reason is > cpuid instruction) caused by calls to gettimeofday that uses tsc > sourceclock. read_tsc calls get_cycles_sync which might call cpuid in > order to serialize the cpu. > > Can you explain why the cpu needs to be serialized for every gettime > call? Do we need to be that accurate? (It will also slightly improve > physical hosts). I believe you have a reason and the answer is yes. In > that case can you replace the serializing instruction with an > instruction that does not trigger vmexit? Maybe use 'ltr' for example? hm, where exactly does it call CPUID? Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org>]
* Re: Performance overhead of get_cycles_sync [not found] ` <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org> @ 2007-12-11 14:11 ` Dor Laor [not found] ` <475E9A92.4030001-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2007-12-11 14:14 ` Dor Laor 1 sibling, 1 reply; 11+ messages in thread From: Dor Laor @ 2007-12-11 14:11 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel, Linux Kernel Mailing List [-- Attachment #1.1: Type: text/plain, Size: 1764 bytes --] Ingo Molnar wrote: > > * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > Hi Ingo, Thomas, > > > > In the latest kernel (2.6.24-rc3) I noticed a drastic performance > > decrease for KVM networking. The reason is many vmexit (exit reason is > > cpuid instruction) caused by calls to gettimeofday that uses tsc > > sourceclock. read_tsc calls get_cycles_sync which might call cpuid in > > order to serialize the cpu. > > > > Can you explain why the cpu needs to be serialized for every gettime > > call? Do we need to be that accurate? (It will also slightly improve > > physical hosts). I believe you have a reason and the answer is yes. In > > that case can you replace the serializing instruction with an > > instruction that does not trigger vmexit? Maybe use 'ltr' for example? > > hm, where exactly does it call CPUID? > > Ingo > Here [include/asm-x86/tsc.h]: /* Like get_cycles, but make sure the CPU is synchronized. */ static __always_inline cycles_t get_cycles_sync(void) { unsigned long long ret; unsigned eax, edx; /* * Use RDTSCP if possible; it is guaranteed to be synchronous * and doesn't cause a VMEXIT on Hypervisors */ alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP, ASM_OUTPUT2("=a" (eax), "=d" (edx)), "a" (0U), "d" (0U) : "ecx", "memory"); ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax); if (ret) return ret; /* * Don't do an additional sync on CPUs where we know * RDTSC is already synchronous: */ // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); rdtscll(ret); return ret; } [-- Attachment #1.2: Type: text/html, Size: 3190 bytes --] [-- Attachment #2: Type: text/plain, Size: 277 bytes --] ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php [-- Attachment #3: Type: text/plain, Size: 186 bytes --] _______________________________________________ kvm-devel mailing list kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org https://lists.sourceforge.net/lists/listinfo/kvm-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <475E9A92.4030001-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: Performance overhead of get_cycles_sync [not found] ` <475E9A92.4030001-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-12-11 14:27 ` Ingo Molnar 2007-12-11 16:35 ` Arjan van de Ven [not found] ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org> 0 siblings, 2 replies; 11+ messages in thread From: Ingo Molnar @ 2007-12-11 14:27 UTC (permalink / raw) To: dor.laor-atKUWr5tajBWk0Htik3J/w; +Cc: kvm-devel, Linux Kernel Mailing List * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Here [include/asm-x86/tsc.h]: > > /* Like get_cycles, but make sure the CPU is synchronized. */ > static __always_inline cycles_t get_cycles_sync(void) > { > unsigned long long ret; > unsigned eax, edx; > > /* > * Use RDTSCP if possible; it is guaranteed to be synchronous > * and doesn't cause a VMEXIT on Hypervisors > */ > alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP, > ASM_OUTPUT2("=a" (eax), "=d" (edx)), > "a" (0U), "d" (0U) : "ecx", "memory"); > ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax); > if (ret) > return ret; > > /* > * Don't do an additional sync on CPUs where we know > * RDTSC is already synchronous: > */ > // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, > // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); > rdtscll(ret); The patch below should resolve this - could you please test and Ack it? But this CPUID was present in v2.6.23 too, so why did it only show up in 2.6.24-rc for you? Ingo --------------> Subject: x86: fix get_cycles_sync() overhead From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> get_cycles_sync() is causing massive overhead in KVM networking: http://lkml.org/lkml/2007/12/11/54 remove the explicit CPUID serialization - it causes VM exits and is pointless: we care about GTOD coherency but that goes to user-space via a syscall, and syscalls are serialization points anyway. Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> --- include/asm-x86/tsc.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) Index: linux-x86.q/include/asm-x86/tsc.h =================================================================== --- linux-x86.q.orig/include/asm-x86/tsc.h +++ linux-x86.q/include/asm-x86/tsc.h @@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl unsigned eax, edx; /* - * Use RDTSCP if possible; it is guaranteed to be synchronous - * and doesn't cause a VMEXIT on Hypervisors + * Use RDTSCP if possible; it is guaranteed to be synchronous + * and doesn't cause a VMEXIT on Hypervisors */ alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP, ASM_OUTPUT2("=a" (eax), "=d" (edx)), @@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl return ret; /* - * Don't do an additional sync on CPUs where we know - * RDTSC is already synchronous: + * Use RDTSC on other CPUs. This might not be fully synchronous, + * but it's not a problem: the only coherency we care about is + * the GTOD output to user-space, and syscalls are synchronization + * points anyway: */ - alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, - "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); rdtscll(ret); return ret; ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance overhead of get_cycles_sync 2007-12-11 14:27 ` Ingo Molnar @ 2007-12-11 16:35 ` Arjan van de Ven [not found] ` <20071211083513.56c2a385-NIQFrBLA1CpScpXdPBN83iCwEArCW2h5@public.gmane.org> [not found] ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: Arjan van de Ven @ 2007-12-11 16:35 UTC (permalink / raw) To: Ingo Molnar; +Cc: dor.laor, tglx, Linux Kernel Mailing List, kvm-devel On Tue, 11 Dec 2007 15:27:17 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > * Dor Laor <dor.laor@gmail.com> wrote: > > > The patch below should resolve this - could you please test and Ack > it? But this CPUID was present in v2.6.23 too, so why did it only > show up in 2.6.24-rc for you? isn't this probably wrong since this code is also used in the vsyscall code.. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20071211083513.56c2a385-NIQFrBLA1CpScpXdPBN83iCwEArCW2h5@public.gmane.org>]
* Re: Performance overhead of get_cycles_sync [not found] ` <20071211083513.56c2a385-NIQFrBLA1CpScpXdPBN83iCwEArCW2h5@public.gmane.org> @ 2007-12-11 17:03 ` Ingo Molnar [not found] ` <p73abohno0u.fsf@bingen.suse.de> 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2007-12-11 17:03 UTC (permalink / raw) To: Arjan van de Ven; +Cc: kvm-devel, Linux Kernel Mailing List * Arjan van de Ven <arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > > * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > The patch below should resolve this - could you please test and Ack > > it? But this CPUID was present in v2.6.23 too, so why did it only > > show up in 2.6.24-rc for you? > > isn't this probably wrong since this code is also used in the vsyscall > code.. the TSC clocksource (and hence the vsyscall code) is turned off on systems that fail the TOD/CLOCK portion of this test: http://people.redhat.com/mingo/time-warp-test/time-warp-test.c i.e. on the majority of systems in place. Ingo ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <p73abohno0u.fsf@bingen.suse.de>]
[parent not found: <20071211201930.GB22397@elte.hu>]
[parent not found: <20071211201930.GB22397-X9Un+BFzKDI@public.gmane.org>]
* Re: Performance overhead of get_cycles_sync [not found] ` <20071211201930.GB22397-X9Un+BFzKDI@public.gmane.org> @ 2007-12-11 20:29 ` Ingo Molnar 0 siblings, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2007-12-11 20:29 UTC (permalink / raw) To: Andi Kleen; +Cc: kvm-devel, linux-kernel-u79uwXL29TY76Z2rM5mHXA * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote: > > Which is not on core2 which was the question about. And if it was > > turned off it wouldn't use get_cycles_sync() at all. > > it is turned off on core2 too: > > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource > acpi_pm but you are right that it's not turned off on all core2's, so my patch is wrong for them. Ingo ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org>]
* Re: Performance overhead of get_cycles_sync [not found] ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org> @ 2007-12-11 15:03 ` Dor Laor 2007-12-11 21:26 ` Joerg Roedel 1 sibling, 0 replies; 11+ messages in thread From: Dor Laor @ 2007-12-11 15:03 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel, Linux Kernel Mailing List Ingo Molnar wrote: > * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > >> Here [include/asm-x86/tsc.h]: >> >> /* Like get_cycles, but make sure the CPU is synchronized. */ >> static __always_inline cycles_t get_cycles_sync(void) >> { >> unsigned long long ret; >> unsigned eax, edx; >> >> /* >> * Use RDTSCP if possible; it is guaranteed to be synchronous >> * and doesn't cause a VMEXIT on Hypervisors >> */ >> alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP, >> ASM_OUTPUT2("=a" (eax), "=d" (edx)), >> "a" (0U), "d" (0U) : "ecx", "memory"); >> ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax); >> if (ret) >> return ret; >> >> /* >> * Don't do an additional sync on CPUs where we know >> * RDTSC is already synchronous: >> */ >> // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, >> // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); >> rdtscll(ret); >> > > The patch below should resolve this - could you please test and Ack it? > It works, actually I already commented it out. Acked-by: Dor Laor <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org> But this CPUID was present in v2.6.23 too, so why did it only show up in > 2.6.24-rc for you? > > I tried to figure out but all the code movements for i386 go in the way. In the previous email I reported to Andi that Fedora kernel 2.6.23-8 did not suffer from it. Thanks for the ultra fast reply :) Dor > Ingo > > --------------> > Subject: x86: fix get_cycles_sync() overhead > From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> > > get_cycles_sync() is causing massive overhead in KVM networking: > > http://lkml.org/lkml/2007/12/11/54 > > remove the explicit CPUID serialization - it causes VM exits and is > pointless: we care about GTOD coherency but that goes to user-space > via a syscall, and syscalls are serialization points anyway. > > Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> > Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> > --- > include/asm-x86/tsc.h | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > Index: linux-x86.q/include/asm-x86/tsc.h > =================================================================== > --- linux-x86.q.orig/include/asm-x86/tsc.h > +++ linux-x86.q/include/asm-x86/tsc.h > @@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl > unsigned eax, edx; > > /* > - * Use RDTSCP if possible; it is guaranteed to be synchronous > - * and doesn't cause a VMEXIT on Hypervisors > + * Use RDTSCP if possible; it is guaranteed to be synchronous > + * and doesn't cause a VMEXIT on Hypervisors > */ > alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP, > ASM_OUTPUT2("=a" (eax), "=d" (edx)), > @@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl > return ret; > > /* > - * Don't do an additional sync on CPUs where we know > - * RDTSC is already synchronous: > + * Use RDTSC on other CPUs. This might not be fully synchronous, > + * but it's not a problem: the only coherency we care about is > + * the GTOD output to user-space, and syscalls are synchronization > + * points anyway: > */ > - alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, > - "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); > rdtscll(ret); > > return ret; > > ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance overhead of get_cycles_sync [not found] ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org> 2007-12-11 15:03 ` Dor Laor @ 2007-12-11 21:26 ` Joerg Roedel [not found] ` <20071211212628.GB6537-5C7GfCeVMHo@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: Joerg Roedel @ 2007-12-11 21:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel, Andi Kleen, Linux Kernel Mailing List On Tue, Dec 11, 2007 at 03:27:17PM +0100, Ingo Molnar wrote: > > * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > Here [include/asm-x86/tsc.h]: > > > > /* Like get_cycles, but make sure the CPU is synchronized. */ > > static __always_inline cycles_t get_cycles_sync(void) > > { > > unsigned long long ret; > > unsigned eax, edx; > > > > /* > > * Use RDTSCP if possible; it is guaranteed to be synchronous > > * and doesn't cause a VMEXIT on Hypervisors > > */ > > alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP, > > ASM_OUTPUT2("=a" (eax), "=d" (edx)), > > "a" (0U), "d" (0U) : "ecx", "memory"); > > ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax); > > if (ret) > > return ret; > > > > /* > > * Don't do an additional sync on CPUs where we know > > * RDTSC is already synchronous: > > */ > > // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, > > // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); > > rdtscll(ret); > > The patch below should resolve this - could you please test and Ack it? > But this CPUID was present in v2.6.23 too, so why did it only show up in > 2.6.24-rc for you? > > Ingo > > --------------> > Subject: x86: fix get_cycles_sync() overhead > From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> > > get_cycles_sync() is causing massive overhead in KVM networking: > > http://lkml.org/lkml/2007/12/11/54 > > remove the explicit CPUID serialization - it causes VM exits and is > pointless: we care about GTOD coherency but that goes to user-space > via a syscall, and syscalls are serialization points anyway. > > Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> > Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> > --- > include/asm-x86/tsc.h | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > Index: linux-x86.q/include/asm-x86/tsc.h > =================================================================== > --- linux-x86.q.orig/include/asm-x86/tsc.h > +++ linux-x86.q/include/asm-x86/tsc.h > @@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl > unsigned eax, edx; > > /* > - * Use RDTSCP if possible; it is guaranteed to be synchronous > - * and doesn't cause a VMEXIT on Hypervisors > + * Use RDTSCP if possible; it is guaranteed to be synchronous > + * and doesn't cause a VMEXIT on Hypervisors > */ > alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP, > ASM_OUTPUT2("=a" (eax), "=d" (edx)), > @@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl > return ret; > > /* > - * Don't do an additional sync on CPUs where we know > - * RDTSC is already synchronous: > + * Use RDTSC on other CPUs. This might not be fully synchronous, > + * but it's not a problem: the only coherency we care about is > + * the GTOD output to user-space, and syscalls are synchronization > + * points anyway: > */ > - alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, > - "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); > rdtscll(ret); > > return ret; I don't think this is a good idea. I discussed exactly this item with Andi Kleen a while ago and afair the serializing instruction was necessary to fix a backwards walking gettimeofday() on some K8 revisions. Andi Kleen can tell more details, I added him to the CC list. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20071211212628.GB6537-5C7GfCeVMHo@public.gmane.org>]
* Re: Performance overhead of get_cycles_sync [not found] ` <20071211212628.GB6537-5C7GfCeVMHo@public.gmane.org> @ 2007-12-12 0:19 ` Dor Laor 0 siblings, 0 replies; 11+ messages in thread From: Dor Laor @ 2007-12-12 0:19 UTC (permalink / raw) To: Joerg Roedel; +Cc: kvm-devel, Andi Kleen, Linux Kernel Mailing List [-- Attachment #1.1: Type: text/plain, Size: 1068 bytes --] >> - * Don't do an additional sync on CPUs where we know >> - * RDTSC is already synchronous: >> + * Use RDTSC on other CPUs. This might not be fully synchronous, >> + * but it's not a problem: the only coherency we care about is >> + * the GTOD output to user-space, and syscalls are synchronization >> + * points anyway: >> */ >> - alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, >> - "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); >> rdtscll(ret); >> >> return ret; >> > > I don't think this is a good idea. I discussed exactly this item with > Andi Kleen a while ago and afair the serializing instruction was > necessary to fix a backwards walking gettimeofday() on some K8 > revisions. Andi Kleen can tell more details, I added him to the CC list. > > Joerg > > So I suggest we'll wait for Andi Kleen's 24' patch using [l|m]fence. Meanwhile one shouldn't use tsc clock source in guests or alternatively mascaraed the guest cpu as old Intel version [one can do that with kvm, although it hurt performance]. Regards, Dor [-- Attachment #1.2: Type: text/html, Size: 1453 bytes --] [-- Attachment #2: Type: text/plain, Size: 277 bytes --] ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php [-- Attachment #3: Type: text/plain, Size: 186 bytes --] _______________________________________________ kvm-devel mailing list kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org https://lists.sourceforge.net/lists/listinfo/kvm-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance overhead of get_cycles_sync [not found] ` <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org> 2007-12-11 14:11 ` Dor Laor @ 2007-12-11 14:14 ` Dor Laor 1 sibling, 0 replies; 11+ messages in thread From: Dor Laor @ 2007-12-11 14:14 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel, Linux Kernel Mailing List Ingo Molnar wrote: > > * Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > Hi Ingo, Thomas, > > > > In the latest kernel (2.6.24-rc3) I noticed a drastic performance > > decrease for KVM networking. The reason is many vmexit (exit reason is > > cpuid instruction) caused by calls to gettimeofday that uses tsc > > sourceclock. read_tsc calls get_cycles_sync which might call cpuid in > > order to serialize the cpu. > > > > Can you explain why the cpu needs to be serialized for every gettime > > call? Do we need to be that accurate? (It will also slightly improve > > physical hosts). I believe you have a reason and the answer is yes. In > > that case can you replace the serializing instruction with an > > instruction that does not trigger vmexit? Maybe use 'ltr' for example? > > hm, where exactly does it call CPUID? > > Ingo > Here, commented out [include/asm-x86/tsc.h]: /* Like get_cycles, but make sure the CPU is synchronized. */ static __always_inline cycles_t get_cycles_sync(void) { unsigned long long ret; unsigned eax, edx; /* * Use RDTSCP if possible; it is guaranteed to be synchronous * and doesn't cause a VMEXIT on Hypervisors */ alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP, ASM_OUTPUT2("=a" (eax), "=d" (edx)), "a" (0U), "d" (0U) : "ecx", "memory"); ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax); if (ret) return ret; /* * Don't do an additional sync on CPUs where we know * RDTSC is already synchronous: */ // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); rdtscll(ret); return ret; } ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-12-12 0:19 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-11 13:11 Performance overhead of get_cycles_sync Dor Laor
2007-12-11 13:37 ` Ingo Molnar
[not found] ` <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org>
2007-12-11 14:11 ` Dor Laor
[not found] ` <475E9A92.4030001-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-11 14:27 ` Ingo Molnar
2007-12-11 16:35 ` Arjan van de Ven
[not found] ` <20071211083513.56c2a385-NIQFrBLA1CpScpXdPBN83iCwEArCW2h5@public.gmane.org>
2007-12-11 17:03 ` Ingo Molnar
[not found] ` <p73abohno0u.fsf@bingen.suse.de>
[not found] ` <20071211201930.GB22397@elte.hu>
[not found] ` <20071211201930.GB22397-X9Un+BFzKDI@public.gmane.org>
2007-12-11 20:29 ` Ingo Molnar
[not found] ` <20071211142717.GA15903-X9Un+BFzKDI@public.gmane.org>
2007-12-11 15:03 ` Dor Laor
2007-12-11 21:26 ` Joerg Roedel
[not found] ` <20071211212628.GB6537-5C7GfCeVMHo@public.gmane.org>
2007-12-12 0:19 ` Dor Laor
2007-12-11 14:14 ` Dor Laor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox