* Whats the purpose of get_cycles_sync()
@ 2007-10-30 17:44 Joerg Roedel
2007-10-30 20:21 ` Andi Kleen
0 siblings, 1 reply; 6+ messages in thread
From: Joerg Roedel @ 2007-10-30 17:44 UTC (permalink / raw)
To: tglx, mingo, hpa; +Cc: linux-kernel, benjamin.serebrin
Hi,
I would like to answer what the special purpose of the get_cycles_sync()
function is in the x86 architecture. In special I ask myself why
this function has to be *sync*?
I mean, the sync should guarantee here that the CPU does not execute the
RDTSC instruction out-of-order, thats clear. But does that really
matter? If there is a cache/tlb miss before the function returns all
accuracy that should be won by the synchronous RDTSC is lost anyway.
The problem here is, that this function executes CPUID if RDTSC itself
is not a synchronizing instruction and CPUID is very often intercepted
by hypervisors (KVM intercepts it for example). This makes this function
very expensive if the kernel is executed as a guest.
But maybe I miss some important things here.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Whats the purpose of get_cycles_sync()
2007-10-30 17:44 Whats the purpose of get_cycles_sync() Joerg Roedel
@ 2007-10-30 20:21 ` Andi Kleen
2007-10-30 22:02 ` Vojtech Pavlik
2007-10-31 10:18 ` Joerg Roedel
0 siblings, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2007-10-30 20:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: tglx, mingo, hpa, linux-kernel, benjamin.serebrin, vojtech
"Joerg Roedel" <joerg.roedel@amd.com> writes:
> I would like to answer what the special purpose of the get_cycles_sync()
> function is in the x86 architecture. In special I ask myself why
> this function has to be *sync*?
Vojtech had one test that tested time monotonicity over CPUs
and it constantly failed until we added the CPUID on K8 C stepping.
He can give details on the test.
I suspect the reason was because the CPU reordered the RDTSCs so that
a later RDTSC could return a value before an earlier one. This can
happen because gettimeofday() is so fast that a tight loop calling it can
fit more than one iteration into the CPU's reordering window.
> I mean, the sync should guarantee here that the CPU does not execute the
> RDTSC instruction out-of-order, thats clear. But does that really
> matter? If there is a cache/tlb miss before the function returns all
> accuracy that should be won by the synchronous RDTSC is lost anyway.
>
> The problem here is, that this function executes CPUID if RDTSC itself
> is not a synchronizing instruction and CPUID is very often intercepted
> by hypervisors (KVM intercepts it for example). This makes this function
> very expensive if the kernel is executed as a guest.
That is why newer kernels use RDTSCP if available which doesn't need
to be intercepted and is synchronous. And since all AMD SVM systems
have RDTSCP they are fine.
On Intel Core2 without RDTSCP the CPUID can be still intercepted right
now, but the real fix there is to readd FEATURE_SYNC_TSC for Core2 --
the RDTSC there is always monotonic per CPU and the patch that changed
that (f3d73707a1e84f0687a05144b70b660441e999c7) was bogus and must be
reverted. I didn't catch that in time unfortunately.
-Andi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Whats the purpose of get_cycles_sync()
2007-10-30 20:21 ` Andi Kleen
@ 2007-10-30 22:02 ` Vojtech Pavlik
2007-10-30 22:42 ` Andi Kleen
2007-10-31 10:23 ` Joerg Roedel
2007-10-31 10:18 ` Joerg Roedel
1 sibling, 2 replies; 6+ messages in thread
From: Vojtech Pavlik @ 2007-10-30 22:02 UTC (permalink / raw)
To: Andi Kleen
Cc: Joerg Roedel, tglx, mingo, hpa, linux-kernel, benjamin.serebrin
On Tue, Oct 30, 2007 at 09:21:02PM +0100, Andi Kleen wrote:
> "Joerg Roedel" <joerg.roedel@amd.com> writes:
>
> > I would like to answer what the special purpose of the get_cycles_sync()
> > function is in the x86 architecture. In special I ask myself why
> > this function has to be *sync*?
>
> Vojtech had one test that tested time monotonicity over CPUs
> and it constantly failed until we added the CPUID on K8 C stepping.
> He can give details on the test.
>
> I suspect the reason was because the CPU reordered the RDTSCs so that
> a later RDTSC could return a value before an earlier one. This can
> happen because gettimeofday() is so fast that a tight loop calling it can
> fit more than one iteration into the CPU's reordering window.
The K8's still guarantee that subsequent RDTSCs return increasing
values, even if the processor reorders them.
What could have been happening then was that the RDTSC instruction might
have been reordered by the CPU out of the seqlock, causing trouble in
the calculation.
Anyway, adding the CPUID didn't solve all the problems we've seen back
then, and so far none of the approaches for using TSC without acquiring
a spinlock on multi-socket AMD boxes worked 100% correctly.
> That is why newer kernels use RDTSCP if available which doesn't need
> to be intercepted and is synchronous. And since all AMD SVM systems
> have RDTSCP they are fine.
>
> On Intel Core2 without RDTSCP the CPUID can be still intercepted right
> now, but the real fix there is to readd FEATURE_SYNC_TSC for Core2 --
> the RDTSC there is always monotonic per CPU and the patch that changed
> that (f3d73707a1e84f0687a05144b70b660441e999c7) was bogus and must be
> reverted. I didn't catch that in time unfortunately.
--
Vojtech Pavlik
Director SuSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Whats the purpose of get_cycles_sync()
2007-10-30 22:02 ` Vojtech Pavlik
@ 2007-10-30 22:42 ` Andi Kleen
2007-10-31 10:23 ` Joerg Roedel
1 sibling, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2007-10-30 22:42 UTC (permalink / raw)
To: Vojtech Pavlik
Cc: Andi Kleen, Joerg Roedel, tglx, mingo, hpa, linux-kernel,
benjamin.serebrin
On Tue, Oct 30, 2007 at 11:02:09PM +0100, Vojtech Pavlik wrote:
> > He can give details on the test.
> >
> > I suspect the reason was because the CPU reordered the RDTSCs so that
> > a later RDTSC could return a value before an earlier one. This can
> > happen because gettimeofday() is so fast that a tight loop calling it can
> > fit more than one iteration into the CPU's reordering window.
>
> The K8's still guarantee that subsequent RDTSCs return increasing
> values, even if the processor reorders them.
Ah didn't realize this
>
> What could have been happening then was that the RDTSC instruction might
> have been reordered by the CPU out of the seqlock, causing trouble in
> the calculation.
Ok anyways it fixed that problem. So it cannot be taken out.
>
> Anyway, adding the CPUID didn't solve all the problems we've seen back
> then, and so far none of the approaches for using TSC without acquiring
> a spinlock on multi-socket AMD boxes worked 100% correctly.
The code is not used on multi-core anyways currently (without Jiri's
patch). It should just work correctly on single core.
-Andi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Whats the purpose of get_cycles_sync()
2007-10-30 22:02 ` Vojtech Pavlik
2007-10-30 22:42 ` Andi Kleen
@ 2007-10-31 10:23 ` Joerg Roedel
1 sibling, 0 replies; 6+ messages in thread
From: Joerg Roedel @ 2007-10-31 10:23 UTC (permalink / raw)
To: Vojtech Pavlik
Cc: Andi Kleen, Joerg Roedel, tglx, mingo, hpa, linux-kernel,
benjamin.serebrin
Hi Vojtech,
On Tue, Oct 30, 2007 at 11:02:09PM +0100, Vojtech Pavlik wrote:
> The K8's still guarantee that subsequent RDTSCs return increasing
> values, even if the processor reorders them.
>
> What could have been happening then was that the RDTSC instruction might
> have been reordered by the CPU out of the seqlock, causing trouble in
> the calculation.
>
> Anyway, adding the CPUID didn't solve all the problems we've seen back
> then, and so far none of the approaches for using TSC without acquiring
> a spinlock on multi-socket AMD boxes worked 100% correctly.
Can you tell me more about the problems you have seen or give me a
pointer to a mail discussion regarding that problems? Can you also
provide your test program to me please? I want to understand these
problems a bit better.
Joerg
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Whats the purpose of get_cycles_sync()
2007-10-30 20:21 ` Andi Kleen
2007-10-30 22:02 ` Vojtech Pavlik
@ 2007-10-31 10:18 ` Joerg Roedel
1 sibling, 0 replies; 6+ messages in thread
From: Joerg Roedel @ 2007-10-31 10:18 UTC (permalink / raw)
To: Andi Kleen
Cc: Joerg Roedel, tglx, mingo, hpa, linux-kernel, benjamin.serebrin,
vojtech
Hi Andi,
On Tue, Oct 30, 2007 at 09:21:02PM +0100, Andi Kleen wrote:
> "Joerg Roedel" <joerg.roedel@amd.com> writes:
>
> > I would like to answer what the special purpose of the get_cycles_sync()
> > function is in the x86 architecture. In special I ask myself why
> > this function has to be *sync*?
>
> Vojtech had one test that tested time monotonicity over CPUs
> and it constantly failed until we added the CPUID on K8 C stepping.
> He can give details on the test.
Interesting, I wasn't aware of that.
> I suspect the reason was because the CPU reordered the RDTSCs so that
> a later RDTSC could return a value before an earlier one. This can
> happen because gettimeofday() is so fast that a tight loop calling it can
> fit more than one iteration into the CPU's reordering window.
Ok, that is the reason why the get_cycles_sync() function only exists on
x86_64 and not on i386, because on i386 gettimeofday() is a real
syscall?
> That is why newer kernels use RDTSCP if available which doesn't need
> to be intercepted and is synchronous. And since all AMD SVM systems
> have RDTSCP they are fine.
The problem with KVM here is that they wan't to migrate guests between
Intel and AMD boxes. So they don't export RDTSCP or FEATURE_SYNC_TSC to
the guests in the CPUID calls. A 64bit Linux guest will execute the
CPUID in that function.
Joerg
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-10-31 10:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-30 17:44 Whats the purpose of get_cycles_sync() Joerg Roedel
2007-10-30 20:21 ` Andi Kleen
2007-10-30 22:02 ` Vojtech Pavlik
2007-10-30 22:42 ` Andi Kleen
2007-10-31 10:23 ` Joerg Roedel
2007-10-31 10:18 ` Joerg Roedel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).