All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] perf: x86: Improve accuracy of perf/sched clock
@ 2015-07-28 21:14 Adrian Hunter
  2015-08-17  7:34 ` Adrian Hunter
  2015-08-20 19:31 ` Thomas Gleixner
  0 siblings, 2 replies; 9+ messages in thread
From: Adrian Hunter @ 2015-07-28 21:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Andy Lutomirski, Thomas Gleixner,
	linux-kernel, Stephane Eranian, Andi Kleen

When TSC is stable perf/sched clock is based on it.
However the conversion from cycles to nanoseconds
is not as accurate as it could be.  Because
CYC2NS_SCALE_FACTOR is 10, the accuracy is +/- 1/2048

The change is to calculate the maximum shift that
results in a multiplier that is still a 32-bit number.
For example all frequencies over 1 GHz will have
a shift of 32, making the accuracy of the conversion
+/- 1/(2^33)

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/kernel/tsc.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 7437b41f6a47..e7085bcfb06b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -167,21 +167,21 @@ static void cyc2ns_write_end(int cpu, struct cyc2ns_data *data)
  *              ns = cycles * cyc2ns_scale / SC
  *
  *      And since SC is a constant power of two, we can convert the div
- *  into a shift.
+ *  into a shift. The larger SC is, the more accurate the conversion, but
+ *  cyc2ns_scale needs to be a 32-bit value so that 32-bit multiplication
+ *  (64-bit result) can be used. So start by trying SC = 2^32, reducing
+ *  until the criteria are met.
  *
- *  We can use khz divisor instead of mhz to keep a better precision, since
- *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
+ *  We can use khz divisor instead of mhz to keep a better precision.
  *  (mathieu.desnoyers@polymtl.ca)
  *
  *                      -johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
-
 static void cyc2ns_data_init(struct cyc2ns_data *data)
 {
 	data->cyc2ns_mul = 0;
-	data->cyc2ns_shift = CYC2NS_SCALE_FACTOR;
+	data->cyc2ns_shift = 0;
 	data->cyc2ns_offset = 0;
 	data->__count = 0;
 }
@@ -215,14 +215,14 @@ static inline unsigned long long cycles_2_ns(unsigned long long cyc)
 
 	if (likely(data == tail)) {
 		ns = data->cyc2ns_offset;
-		ns += mul_u64_u32_shr(cyc, data->cyc2ns_mul, CYC2NS_SCALE_FACTOR);
+		ns += mul_u64_u32_shr(cyc, data->cyc2ns_mul, data->cyc2ns_shift);
 	} else {
 		data->__count++;
 
 		barrier();
 
 		ns = data->cyc2ns_offset;
-		ns += mul_u64_u32_shr(cyc, data->cyc2ns_mul, CYC2NS_SCALE_FACTOR);
+		ns += mul_u64_u32_shr(cyc, data->cyc2ns_mul, data->cyc2ns_shift);
 
 		barrier();
 
@@ -239,6 +239,8 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 	unsigned long long tsc_now, ns_now;
 	struct cyc2ns_data *data;
 	unsigned long flags;
+	u64 mult;
+	u32 shft = 32;
 
 	local_irq_save(flags);
 	sched_clock_idle_sleep_event();
@@ -256,12 +258,17 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 	 * time function is continuous; see the comment near struct
 	 * cyc2ns_data.
 	 */
-	data->cyc2ns_mul =
-		DIV_ROUND_CLOSEST(NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR,
-				  cpu_khz);
-	data->cyc2ns_shift = CYC2NS_SCALE_FACTOR;
+	mult = (u64)NSEC_PER_MSEC << 32;
+	mult += cpu_khz / 2;
+	do_div(mult, cpu_khz);
+	while (mult > U32_MAX) {
+		mult >>= 1;
+		shft -= 1;
+	}
+	data->cyc2ns_mul = mult;
+	data->cyc2ns_shift = shft;
 	data->cyc2ns_offset = ns_now -
-		mul_u64_u32_shr(tsc_now, data->cyc2ns_mul, CYC2NS_SCALE_FACTOR);
+		mul_u64_u32_shr(tsc_now, data->cyc2ns_mul, data->cyc2ns_shift);
 
 	cyc2ns_write_end(cpu, data);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-09-13 11:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-28 21:14 [PATCH V2] perf: x86: Improve accuracy of perf/sched clock Adrian Hunter
2015-08-17  7:34 ` Adrian Hunter
2015-08-17  9:56   ` Peter Zijlstra
2015-08-20 19:31 ` Thomas Gleixner
2015-08-21  6:46   ` Adrian Hunter
2015-08-21  9:05     ` [PATCH V3] " Adrian Hunter
2015-09-01  8:36       ` Adrian Hunter
2015-09-01  8:57         ` Peter Zijlstra
2015-09-13 11:08       ` [tip:perf/core] perf/x86: " tip-bot for Adrian Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.