All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] MIPS: Guest timekeeping improvements
@ 2016-04-22 17:19 ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: James Hogan, Daniel Lezcano, Thomas Gleixner, linux-mips,
	linux-kernel

These patches improve timekeeping of MIPS/Malta kernels running in a KVM
guest.

Patch 2 fixes malta frequency calculation under virtualisation,
especially on very slow targets (FPGA / emulators). Patch 1 is a minor
fix for something I noticed while writing patch 2.

Patch 3 drops the use of the PIT timer for Malta, which is slow to
emulate with KVM + QEMU.

Finally patch 4 calculates min_delta_ns of cevt-r4k dynamically to
handle virtualised environments with software emulated Count/Compare,
and where Count frequency may not be directly related to actual CPU
speed (and so the static value of 0x300 may be no good).

James Hogan (4):
  MIPS: malta-time: Start GIC count before syncing to RTC
  MIPS: malta-time: Take seconds into account
  MIPS: malta-time: Don't use PIT timer for cevt/csrc
  MIPS: cevt-r4k: Dynamically calculate min_delta_ns

 arch/mips/Kconfig                |  1 -
 arch/mips/kernel/cevt-r4k.c      | 82 +++++++++++++++++++++++++++++++++++++++-
 arch/mips/mti-malta/malta-time.c | 50 +++++++++++++++---------
 3 files changed, 113 insertions(+), 20 deletions(-)

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
-- 
2.4.10

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] MIPS: Guest timekeeping improvements
@ 2016-04-22 17:19 ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: James Hogan, Daniel Lezcano, Thomas Gleixner, linux-mips,
	linux-kernel

These patches improve timekeeping of MIPS/Malta kernels running in a KVM
guest.

Patch 2 fixes malta frequency calculation under virtualisation,
especially on very slow targets (FPGA / emulators). Patch 1 is a minor
fix for something I noticed while writing patch 2.

Patch 3 drops the use of the PIT timer for Malta, which is slow to
emulate with KVM + QEMU.

Finally patch 4 calculates min_delta_ns of cevt-r4k dynamically to
handle virtualised environments with software emulated Count/Compare,
and where Count frequency may not be directly related to actual CPU
speed (and so the static value of 0x300 may be no good).

James Hogan (4):
  MIPS: malta-time: Start GIC count before syncing to RTC
  MIPS: malta-time: Take seconds into account
  MIPS: malta-time: Don't use PIT timer for cevt/csrc
  MIPS: cevt-r4k: Dynamically calculate min_delta_ns

 arch/mips/Kconfig                |  1 -
 arch/mips/kernel/cevt-r4k.c      | 82 +++++++++++++++++++++++++++++++++++++++-
 arch/mips/mti-malta/malta-time.c | 50 +++++++++++++++---------
 3 files changed, 113 insertions(+), 20 deletions(-)

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
-- 
2.4.10

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] MIPS: malta-time: Start GIC count before syncing to RTC
@ 2016-04-22 17:19   ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: James Hogan, linux-mips

The sampling of the GIC counter on Malta after observing a rising edge
of the RTC update flag differs slightly between the first and second
sample, with the first sample also calling gic_start_count(). The two
samples should really be taken as similarly as possible to get the most
accurate figure, so move the gic_start_count() call before detecting the
rising edge.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---
 arch/mips/mti-malta/malta-time.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/mips/mti-malta/malta-time.c b/arch/mips/mti-malta/malta-time.c
index b7bf721eabf5..2539687b77f6 100644
--- a/arch/mips/mti-malta/malta-time.c
+++ b/arch/mips/mti-malta/malta-time.c
@@ -81,16 +81,16 @@ static void __init estimate_frequencies(void)
 
 	local_irq_save(flags);
 
-	/* Start counter exactly on falling edge of update flag. */
-	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
-	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
-
-	/* Initialize counters. */
-	start = read_c0_count();
-	if (gic_present) {
+	if (gic_present)
 		gic_start_count();
+
+	/* Read counter exactly on falling edge of update flag. */
+	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
+	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
+
+	start = read_c0_count();
+	if (gic_present)
 		gicstart = gic_read_count();
-	}
 
 	/* Read counter exactly on falling edge of update flag. */
 	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/4] MIPS: malta-time: Start GIC count before syncing to RTC
@ 2016-04-22 17:19   ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: James Hogan, linux-mips

The sampling of the GIC counter on Malta after observing a rising edge
of the RTC update flag differs slightly between the first and second
sample, with the first sample also calling gic_start_count(). The two
samples should really be taken as similarly as possible to get the most
accurate figure, so move the gic_start_count() call before detecting the
rising edge.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---
 arch/mips/mti-malta/malta-time.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/mips/mti-malta/malta-time.c b/arch/mips/mti-malta/malta-time.c
index b7bf721eabf5..2539687b77f6 100644
--- a/arch/mips/mti-malta/malta-time.c
+++ b/arch/mips/mti-malta/malta-time.c
@@ -81,16 +81,16 @@ static void __init estimate_frequencies(void)
 
 	local_irq_save(flags);
 
-	/* Start counter exactly on falling edge of update flag. */
-	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
-	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
-
-	/* Initialize counters. */
-	start = read_c0_count();
-	if (gic_present) {
+	if (gic_present)
 		gic_start_count();
+
+	/* Read counter exactly on falling edge of update flag. */
+	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
+	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
+
+	start = read_c0_count();
+	if (gic_present)
 		gicstart = gic_read_count();
-	}
 
 	/* Read counter exactly on falling edge of update flag. */
 	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] MIPS: malta-time: Take seconds into account
@ 2016-04-22 17:19   ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: James Hogan, linux-mips

When estimating the clock frequency based on the RTC, take seconds into
account in case the Update In Progress (UIP) bit wasn't seen. This can
happen in virtual machines (which may get pre-empted by the hypervisor
at inopportune times) with QEMU emulating the RTC (and in fact not
setting the UIP bit for very long), especially on slow hosts such as
FPGA systems and hardware emulators. This results in several seconds
actually having elapsed before seeing the UIP bit instead of just one
second, and exaggerated timer frequencies.

While updating the comments, they're also fixed to match the code in
that the rising edge of the update flag is detected first, not the
falling edge.

The rising edge gives a more precise point to read the counters in a
virtualised system than the falling edge, resulting in a more accurate
frequency.

It does however mean that we have to also wait for the falling edge
before doing the read of the RTC seconds register, otherwise it seems to
be possible in slow hardware emulation to stray into the interval when
the RTC time is undefined during the update (at least 244uS after the
rising edge of the update flag). This can result in both seconds values
reading the same, and it wrapping to 60 seconds, vastly underestimating
the frequency.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---
 arch/mips/mti-malta/malta-time.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/mips/mti-malta/malta-time.c b/arch/mips/mti-malta/malta-time.c
index 2539687b77f6..7407da04f8d6 100644
--- a/arch/mips/mti-malta/malta-time.c
+++ b/arch/mips/mti-malta/malta-time.c
@@ -21,6 +21,7 @@
 #include <linux/i8253.h>
 #include <linux/init.h>
 #include <linux/kernel_stat.h>
+#include <linux/math64.h>
 #include <linux/sched.h>
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
@@ -72,6 +73,8 @@ static void __init estimate_frequencies(void)
 {
 	unsigned long flags;
 	unsigned int count, start;
+	unsigned char secs1, secs2, ctrl;
+	int secs;
 	cycle_t giccount = 0, gicstart = 0;
 
 #if defined(CONFIG_KVM_GUEST) && CONFIG_KVM_GUEST_TIMER_FREQ
@@ -84,29 +87,48 @@ static void __init estimate_frequencies(void)
 	if (gic_present)
 		gic_start_count();
 
-	/* Read counter exactly on falling edge of update flag. */
+	/*
+	 * Read counters exactly on rising edge of update flag.
+	 * This helps get an accurate reading under virtualisation.
+	 */
 	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
 	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
-
 	start = read_c0_count();
 	if (gic_present)
 		gicstart = gic_read_count();
 
-	/* Read counter exactly on falling edge of update flag. */
+	/* Wait for falling edge before reading RTC. */
 	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
-	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
+	secs1 = CMOS_READ(RTC_SECONDS);
 
+	/* Read counters again exactly on rising edge of update flag. */
+	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
 	count = read_c0_count();
 	if (gic_present)
 		giccount = gic_read_count();
 
+	/* Wait for falling edge before reading RTC again. */
+	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
+	secs2 = CMOS_READ(RTC_SECONDS);
+
+	ctrl = CMOS_READ(RTC_CONTROL);
+
 	local_irq_restore(flags);
 
+	if (!(ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
+		secs1 = bcd2bin(secs1);
+		secs2 = bcd2bin(secs2);
+	}
+	secs = secs2 - secs1;
+	if (secs < 1)
+		secs += 60;
+
 	count -= start;
+	count /= secs;
 	mips_hpt_frequency = count;
 
 	if (gic_present) {
-		giccount -= gicstart;
+		giccount = div_u64(giccount - gicstart, secs);
 		gic_frequency = giccount;
 	}
 }
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] MIPS: malta-time: Take seconds into account
@ 2016-04-22 17:19   ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: James Hogan, linux-mips

When estimating the clock frequency based on the RTC, take seconds into
account in case the Update In Progress (UIP) bit wasn't seen. This can
happen in virtual machines (which may get pre-empted by the hypervisor
at inopportune times) with QEMU emulating the RTC (and in fact not
setting the UIP bit for very long), especially on slow hosts such as
FPGA systems and hardware emulators. This results in several seconds
actually having elapsed before seeing the UIP bit instead of just one
second, and exaggerated timer frequencies.

While updating the comments, they're also fixed to match the code in
that the rising edge of the update flag is detected first, not the
falling edge.

The rising edge gives a more precise point to read the counters in a
virtualised system than the falling edge, resulting in a more accurate
frequency.

It does however mean that we have to also wait for the falling edge
before doing the read of the RTC seconds register, otherwise it seems to
be possible in slow hardware emulation to stray into the interval when
the RTC time is undefined during the update (at least 244uS after the
rising edge of the update flag). This can result in both seconds values
reading the same, and it wrapping to 60 seconds, vastly underestimating
the frequency.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---
 arch/mips/mti-malta/malta-time.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/mips/mti-malta/malta-time.c b/arch/mips/mti-malta/malta-time.c
index 2539687b77f6..7407da04f8d6 100644
--- a/arch/mips/mti-malta/malta-time.c
+++ b/arch/mips/mti-malta/malta-time.c
@@ -21,6 +21,7 @@
 #include <linux/i8253.h>
 #include <linux/init.h>
 #include <linux/kernel_stat.h>
+#include <linux/math64.h>
 #include <linux/sched.h>
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
@@ -72,6 +73,8 @@ static void __init estimate_frequencies(void)
 {
 	unsigned long flags;
 	unsigned int count, start;
+	unsigned char secs1, secs2, ctrl;
+	int secs;
 	cycle_t giccount = 0, gicstart = 0;
 
 #if defined(CONFIG_KVM_GUEST) && CONFIG_KVM_GUEST_TIMER_FREQ
@@ -84,29 +87,48 @@ static void __init estimate_frequencies(void)
 	if (gic_present)
 		gic_start_count();
 
-	/* Read counter exactly on falling edge of update flag. */
+	/*
+	 * Read counters exactly on rising edge of update flag.
+	 * This helps get an accurate reading under virtualisation.
+	 */
 	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
 	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
-
 	start = read_c0_count();
 	if (gic_present)
 		gicstart = gic_read_count();
 
-	/* Read counter exactly on falling edge of update flag. */
+	/* Wait for falling edge before reading RTC. */
 	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
-	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
+	secs1 = CMOS_READ(RTC_SECONDS);
 
+	/* Read counters again exactly on rising edge of update flag. */
+	while (!(CMOS_READ(RTC_REG_A) & RTC_UIP));
 	count = read_c0_count();
 	if (gic_present)
 		giccount = gic_read_count();
 
+	/* Wait for falling edge before reading RTC again. */
+	while (CMOS_READ(RTC_REG_A) & RTC_UIP);
+	secs2 = CMOS_READ(RTC_SECONDS);
+
+	ctrl = CMOS_READ(RTC_CONTROL);
+
 	local_irq_restore(flags);
 
+	if (!(ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
+		secs1 = bcd2bin(secs1);
+		secs2 = bcd2bin(secs2);
+	}
+	secs = secs2 - secs1;
+	if (secs < 1)
+		secs += 60;
+
 	count -= start;
+	count /= secs;
 	mips_hpt_frequency = count;
 
 	if (gic_present) {
-		giccount -= gicstart;
+		giccount = div_u64(giccount - gicstart, secs);
 		gic_frequency = giccount;
 	}
 }
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc
@ 2016-04-22 17:19   ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: James Hogan, linux-mips

The PIT timer is slow, especially under virtualisation, compared with
the r4k timer, and doesn't really provide any advantage on Malta since
it doesn't support clock scaling (which is where the PIT has an
advantage).

Drop the use of the PIT timer on Malta so that the r4k or GIC timer will
be used in preference.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---
 arch/mips/Kconfig                | 1 -
 arch/mips/mti-malta/malta-time.c | 6 ------
 2 files changed, 7 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2018c2b0e078..c9a0c70334ce 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -442,7 +442,6 @@ config MIPS_MALTA
 	select IRQ_MIPS_CPU
 	select MIPS_GIC
 	select HW_HAS_PCI
-	select I8253
 	select I8259
 	select MIPS_BONITO64
 	select MIPS_CPU_SCACHE
diff --git a/arch/mips/mti-malta/malta-time.c b/arch/mips/mti-malta/malta-time.c
index 7407da04f8d6..82dd1ea6d630 100644
--- a/arch/mips/mti-malta/malta-time.c
+++ b/arch/mips/mti-malta/malta-time.c
@@ -18,7 +18,6 @@
  * Setting up the clock on the MIPS boards.
  */
 #include <linux/types.h>
-#include <linux/i8253.h>
 #include <linux/init.h>
 #include <linux/kernel_stat.h>
 #include <linux/math64.h>
@@ -225,11 +224,6 @@ void __init plat_time_init(void)
 
 	mips_scroll_message();
 
-#ifdef CONFIG_I8253
-	/* Only Malta has a PIT. */
-	setup_pit_timer();
-#endif
-
 #ifdef CONFIG_MIPS_GIC
 	if (gic_present) {
 		freq = freqround(gic_frequency, 5000);
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc
@ 2016-04-22 17:19   ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: James Hogan, linux-mips

The PIT timer is slow, especially under virtualisation, compared with
the r4k timer, and doesn't really provide any advantage on Malta since
it doesn't support clock scaling (which is where the PIT has an
advantage).

Drop the use of the PIT timer on Malta so that the r4k or GIC timer will
be used in preference.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---
 arch/mips/Kconfig                | 1 -
 arch/mips/mti-malta/malta-time.c | 6 ------
 2 files changed, 7 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2018c2b0e078..c9a0c70334ce 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -442,7 +442,6 @@ config MIPS_MALTA
 	select IRQ_MIPS_CPU
 	select MIPS_GIC
 	select HW_HAS_PCI
-	select I8253
 	select I8259
 	select MIPS_BONITO64
 	select MIPS_CPU_SCACHE
diff --git a/arch/mips/mti-malta/malta-time.c b/arch/mips/mti-malta/malta-time.c
index 7407da04f8d6..82dd1ea6d630 100644
--- a/arch/mips/mti-malta/malta-time.c
+++ b/arch/mips/mti-malta/malta-time.c
@@ -18,7 +18,6 @@
  * Setting up the clock on the MIPS boards.
  */
 #include <linux/types.h>
-#include <linux/i8253.h>
 #include <linux/init.h>
 #include <linux/kernel_stat.h>
 #include <linux/math64.h>
@@ -225,11 +224,6 @@ void __init plat_time_init(void)
 
 	mips_scroll_message();
 
-#ifdef CONFIG_I8253
-	/* Only Malta has a PIT. */
-	setup_pit_timer();
-#endif
-
 #ifdef CONFIG_MIPS_GIC
 	if (gic_present) {
 		freq = freqround(gic_frequency, 5000);
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] MIPS: cevt-r4k: Dynamically calculate min_delta_ns
@ 2016-04-22 17:19   ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: James Hogan, Daniel Lezcano, Thomas Gleixner, linux-mips,
	linux-kernel

Calculate the MIPS clockevent device's min_delta_ns dynamically based on
the time it takes to perform the mips_next_event() sequence.

Virtualisation in particular makes the current fixed min_delta of 0x300
inappropriate under some circumstances, as the CP0_Count and CP0_Compare
registers may be being emulated by the hypervisor, and the frequency may
not correspond directly to the CPU frequency.

We actually use twice the median of multiple 75th percentiles of
multiple measurements of how long the mips_next_event() sequence takes,
in order to fairly efficiently eliminate outliers due to unexpected
hypervisor latency (which would need handling with retries when it
occurs during normal operation anyway).

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
---
 arch/mips/kernel/cevt-r4k.c | 82 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 80 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
index 8dfe6a6e1480..e4c21bbf9422 100644
--- a/arch/mips/kernel/cevt-r4k.c
+++ b/arch/mips/kernel/cevt-r4k.c
@@ -28,6 +28,83 @@ static int mips_next_event(unsigned long delta,
 	return res;
 }
 
+/**
+ * calculate_min_delta() - Calculate a good minimum delta for mips_next_event().
+ *
+ * Running under virtualisation can introduce overhead into mips_next_event() in
+ * the form of hypervisor emulation of CP0_Count/CP0_Compare registers,
+ * potentially with an unnatural frequency, which makes a fixed min_delta_ns
+ * value inappropriate as it may be too small.
+ *
+ * It can also introduce occasional latency from the guest being descheduled.
+ *
+ * This function calculates a good minimum delta based roughly on the 75th
+ * percentile of the time taken to do the mips_next_event() sequence, in order
+ * to handle potentially higher overhead while also eliminating outliers due to
+ * unpredictable hypervisor latency (which can be handled by retries).
+ *
+ * Return:	An appropriate minimum delta for the clock event device.
+ */
+static unsigned int calculate_min_delta(void)
+{
+	unsigned int cnt, i, j, k, l;
+	unsigned int buf1[4], buf2[3];
+	unsigned int min_delta;
+
+	/*
+	 * Calculate the median of 5 75th percentiles of 5 samples of how long
+	 * it takes to set CP0_Compare = CP0_Count + delta.
+	 */
+	for (i = 0; i < 5; ++i) {
+		for (j = 0; j < 5; ++j) {
+			/*
+			 * This is like the code in mips_next_event(), and
+			 * directly measures the borderline "safe" delta.
+			 */
+			cnt = read_c0_count();
+			write_c0_compare(cnt);
+			cnt = read_c0_count() - cnt;
+
+			/* Sorted insert into buf1 */
+			for (k = 0; k < j; ++k) {
+				if (cnt < buf1[k]) {
+					l = min_t(unsigned int,
+						  j, ARRAY_SIZE(buf1) - 1);
+					for (; l > k; --l)
+						buf1[l] = buf1[l - 1];
+					break;
+				}
+			}
+			if (k < ARRAY_SIZE(buf1))
+				buf1[k] = cnt;
+		}
+
+		/* Sorted insert of 75th percentile into buf2 */
+		for (k = 0; k < i; ++k) {
+			if (buf1[ARRAY_SIZE(buf1) - 1] < buf2[k]) {
+				l = min_t(unsigned int,
+					  i, ARRAY_SIZE(buf2) - 1);
+				for (; l > k; --l)
+					buf2[l] = buf2[l - 1];
+				break;
+			}
+		}
+		if (k < ARRAY_SIZE(buf2))
+			buf2[k] = buf1[ARRAY_SIZE(buf1) - 1];
+	}
+
+	/* Use 2 * median of 75th percentiles */
+	min_delta = buf2[ARRAY_SIZE(buf2) - 1] * 2;
+
+	/* Don't go too low */
+	if (min_delta < 0x300)
+		min_delta = 0x300;
+
+	pr_debug("%s: median 75th percentile=%#x, min_delta=%#x\n",
+		 __func__, buf2[ARRAY_SIZE(buf2) - 1], min_delta);
+	return min_delta;
+}
+
 DEFINE_PER_CPU(struct clock_event_device, mips_clockevent_device);
 int cp0_timer_irq_installed;
 
@@ -177,7 +254,7 @@ int r4k_clockevent_init(void)
 {
 	unsigned int cpu = smp_processor_id();
 	struct clock_event_device *cd;
-	unsigned int irq;
+	unsigned int irq, min_delta;
 
 	if (!cpu_has_counter || !mips_hpt_frequency)
 		return -ENXIO;
@@ -203,7 +280,8 @@ int r4k_clockevent_init(void)
 
 	/* Calculate the min / max delta */
 	cd->max_delta_ns	= clockevent_delta2ns(0x7fffffff, cd);
-	cd->min_delta_ns	= clockevent_delta2ns(0x300, cd);
+	min_delta		= calculate_min_delta();
+	cd->min_delta_ns	= clockevent_delta2ns(min_delta, cd);
 
 	cd->rating		= 300;
 	cd->irq			= irq;
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] MIPS: cevt-r4k: Dynamically calculate min_delta_ns
@ 2016-04-22 17:19   ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 17:19 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: James Hogan, Daniel Lezcano, Thomas Gleixner, linux-mips,
	linux-kernel

Calculate the MIPS clockevent device's min_delta_ns dynamically based on
the time it takes to perform the mips_next_event() sequence.

Virtualisation in particular makes the current fixed min_delta of 0x300
inappropriate under some circumstances, as the CP0_Count and CP0_Compare
registers may be being emulated by the hypervisor, and the frequency may
not correspond directly to the CPU frequency.

We actually use twice the median of multiple 75th percentiles of
multiple measurements of how long the mips_next_event() sequence takes,
in order to fairly efficiently eliminate outliers due to unexpected
hypervisor latency (which would need handling with retries when it
occurs during normal operation anyway).

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
---
 arch/mips/kernel/cevt-r4k.c | 82 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 80 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
index 8dfe6a6e1480..e4c21bbf9422 100644
--- a/arch/mips/kernel/cevt-r4k.c
+++ b/arch/mips/kernel/cevt-r4k.c
@@ -28,6 +28,83 @@ static int mips_next_event(unsigned long delta,
 	return res;
 }
 
+/**
+ * calculate_min_delta() - Calculate a good minimum delta for mips_next_event().
+ *
+ * Running under virtualisation can introduce overhead into mips_next_event() in
+ * the form of hypervisor emulation of CP0_Count/CP0_Compare registers,
+ * potentially with an unnatural frequency, which makes a fixed min_delta_ns
+ * value inappropriate as it may be too small.
+ *
+ * It can also introduce occasional latency from the guest being descheduled.
+ *
+ * This function calculates a good minimum delta based roughly on the 75th
+ * percentile of the time taken to do the mips_next_event() sequence, in order
+ * to handle potentially higher overhead while also eliminating outliers due to
+ * unpredictable hypervisor latency (which can be handled by retries).
+ *
+ * Return:	An appropriate minimum delta for the clock event device.
+ */
+static unsigned int calculate_min_delta(void)
+{
+	unsigned int cnt, i, j, k, l;
+	unsigned int buf1[4], buf2[3];
+	unsigned int min_delta;
+
+	/*
+	 * Calculate the median of 5 75th percentiles of 5 samples of how long
+	 * it takes to set CP0_Compare = CP0_Count + delta.
+	 */
+	for (i = 0; i < 5; ++i) {
+		for (j = 0; j < 5; ++j) {
+			/*
+			 * This is like the code in mips_next_event(), and
+			 * directly measures the borderline "safe" delta.
+			 */
+			cnt = read_c0_count();
+			write_c0_compare(cnt);
+			cnt = read_c0_count() - cnt;
+
+			/* Sorted insert into buf1 */
+			for (k = 0; k < j; ++k) {
+				if (cnt < buf1[k]) {
+					l = min_t(unsigned int,
+						  j, ARRAY_SIZE(buf1) - 1);
+					for (; l > k; --l)
+						buf1[l] = buf1[l - 1];
+					break;
+				}
+			}
+			if (k < ARRAY_SIZE(buf1))
+				buf1[k] = cnt;
+		}
+
+		/* Sorted insert of 75th percentile into buf2 */
+		for (k = 0; k < i; ++k) {
+			if (buf1[ARRAY_SIZE(buf1) - 1] < buf2[k]) {
+				l = min_t(unsigned int,
+					  i, ARRAY_SIZE(buf2) - 1);
+				for (; l > k; --l)
+					buf2[l] = buf2[l - 1];
+				break;
+			}
+		}
+		if (k < ARRAY_SIZE(buf2))
+			buf2[k] = buf1[ARRAY_SIZE(buf1) - 1];
+	}
+
+	/* Use 2 * median of 75th percentiles */
+	min_delta = buf2[ARRAY_SIZE(buf2) - 1] * 2;
+
+	/* Don't go too low */
+	if (min_delta < 0x300)
+		min_delta = 0x300;
+
+	pr_debug("%s: median 75th percentile=%#x, min_delta=%#x\n",
+		 __func__, buf2[ARRAY_SIZE(buf2) - 1], min_delta);
+	return min_delta;
+}
+
 DEFINE_PER_CPU(struct clock_event_device, mips_clockevent_device);
 int cp0_timer_irq_installed;
 
@@ -177,7 +254,7 @@ int r4k_clockevent_init(void)
 {
 	unsigned int cpu = smp_processor_id();
 	struct clock_event_device *cd;
-	unsigned int irq;
+	unsigned int irq, min_delta;
 
 	if (!cpu_has_counter || !mips_hpt_frequency)
 		return -ENXIO;
@@ -203,7 +280,8 @@ int r4k_clockevent_init(void)
 
 	/* Calculate the min / max delta */
 	cd->max_delta_ns	= clockevent_delta2ns(0x7fffffff, cd);
-	cd->min_delta_ns	= clockevent_delta2ns(0x300, cd);
+	min_delta		= calculate_min_delta();
+	cd->min_delta_ns	= clockevent_delta2ns(min_delta, cd);
 
 	cd->rating		= 300;
 	cd->irq			= irq;
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc
@ 2016-04-22 18:57     ` Maciej W. Rozycki
  0 siblings, 0 replies; 17+ messages in thread
From: Maciej W. Rozycki @ 2016-04-22 18:57 UTC (permalink / raw)
  To: James Hogan; +Cc: Ralf Baechle, linux-mips

On Fri, 22 Apr 2016, James Hogan wrote:

> The PIT timer is slow, especially under virtualisation, compared with
> the r4k timer, and doesn't really provide any advantage on Malta since
> it doesn't support clock scaling (which is where the PIT has an
> advantage).

 I'm not sure I'm able to parse this correctly: are you saying that the 
i8254-compatible PIT in Malta's southbridge does not support clock scaling 
for some reason, unlike the same implementation on other platforms?  
What's the reason then, the southbridge is the same as in many x86 PCs?

> Drop the use of the PIT timer on Malta so that the r4k or GIC timer will
> be used in preference.

 Not everyone uses virtualisation, so it's a functional regression for 
them.  Can't you lower the priority for the timer instead so that it is 
not selected by default, just as it's done with other platforms providing 
a choice of timers?

  Maciej

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc
@ 2016-04-22 18:57     ` Maciej W. Rozycki
  0 siblings, 0 replies; 17+ messages in thread
From: Maciej W. Rozycki @ 2016-04-22 18:57 UTC (permalink / raw)
  To: James Hogan; +Cc: Ralf Baechle, linux-mips

On Fri, 22 Apr 2016, James Hogan wrote:

> The PIT timer is slow, especially under virtualisation, compared with
> the r4k timer, and doesn't really provide any advantage on Malta since
> it doesn't support clock scaling (which is where the PIT has an
> advantage).

 I'm not sure I'm able to parse this correctly: are you saying that the 
i8254-compatible PIT in Malta's southbridge does not support clock scaling 
for some reason, unlike the same implementation on other platforms?  
What's the reason then, the southbridge is the same as in many x86 PCs?

> Drop the use of the PIT timer on Malta so that the r4k or GIC timer will
> be used in preference.

 Not everyone uses virtualisation, so it's a functional regression for 
them.  Can't you lower the priority for the timer instead so that it is 
not selected by default, just as it's done with other platforms providing 
a choice of timers?

  Maciej

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc
@ 2016-04-22 19:23       ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 19:23 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

On Fri, Apr 22, 2016 at 07:57:11PM +0100, Maciej W. Rozycki wrote:
> On Fri, 22 Apr 2016, James Hogan wrote:
> 
> > The PIT timer is slow, especially under virtualisation, compared with
> > the r4k timer, and doesn't really provide any advantage on Malta since
> > it doesn't support clock scaling (which is where the PIT has an
> > advantage).
> 
>  I'm not sure I'm able to parse this correctly: are you saying that the 
> i8254-compatible PIT in Malta's southbridge does not support clock scaling 
> for some reason, unlike the same implementation on other platforms?  
> What's the reason then, the southbridge is the same as in many x86 PCs?

Sorry, that wasn't written very well. the "it" on 3rd line is referring
to our malta support. I mean something more like this:

The PIT timer is slow under KVM+QEMU virtualisation compared with the
r4k timer, and doesn't really provide any advantage on Malta since we
don't support clock scaling on Malta, which is where the PIT would have
an advantage over the CPU clock based r4k timer.

> 
> > Drop the use of the PIT timer on Malta so that the r4k or GIC timer will
> > be used in preference.
> 
>  Not everyone uses virtualisation, so it's a functional regression for 
> them.  Can't you lower the priority for the timer instead so that it is 
> not selected by default, just as it's done with other platforms providing 
> a choice of timers?

I'll look into that. Looking back at my IRC logs I suspect I meant to
check why the PIT was taking priority before submitting upstream, but
forgot.

Thanks!
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc
@ 2016-04-22 19:23       ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-22 19:23 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

On Fri, Apr 22, 2016 at 07:57:11PM +0100, Maciej W. Rozycki wrote:
> On Fri, 22 Apr 2016, James Hogan wrote:
> 
> > The PIT timer is slow, especially under virtualisation, compared with
> > the r4k timer, and doesn't really provide any advantage on Malta since
> > it doesn't support clock scaling (which is where the PIT has an
> > advantage).
> 
>  I'm not sure I'm able to parse this correctly: are you saying that the 
> i8254-compatible PIT in Malta's southbridge does not support clock scaling 
> for some reason, unlike the same implementation on other platforms?  
> What's the reason then, the southbridge is the same as in many x86 PCs?

Sorry, that wasn't written very well. the "it" on 3rd line is referring
to our malta support. I mean something more like this:

The PIT timer is slow under KVM+QEMU virtualisation compared with the
r4k timer, and doesn't really provide any advantage on Malta since we
don't support clock scaling on Malta, which is where the PIT would have
an advantage over the CPU clock based r4k timer.

> 
> > Drop the use of the PIT timer on Malta so that the r4k or GIC timer will
> > be used in preference.
> 
>  Not everyone uses virtualisation, so it's a functional regression for 
> them.  Can't you lower the priority for the timer instead so that it is 
> not selected by default, just as it's done with other platforms providing 
> a choice of timers?

I'll look into that. Looking back at my IRC logs I suspect I meant to
check why the PIT was taking priority before submitting upstream, but
forgot.

Thanks!
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc
  2016-04-22 19:23       ` James Hogan
  (?)
@ 2016-04-22 19:29       ` Ralf Baechle
  2016-04-28 16:36           ` James Hogan
  -1 siblings, 1 reply; 17+ messages in thread
From: Ralf Baechle @ 2016-04-22 19:29 UTC (permalink / raw)
  To: James Hogan; +Cc: Maciej W. Rozycki, linux-mips

On Fri, Apr 22, 2016 at 08:23:12PM +0100, James Hogan wrote:

> >  Not everyone uses virtualisation, so it's a functional regression for 
> > them.  Can't you lower the priority for the timer instead so that it is 
> > not selected by default, just as it's done with other platforms providing 
> > a choice of timers?
> 
> I'll look into that. Looking back at my IRC logs I suspect I meant to
> check why the PIT was taking priority before submitting upstream, but
> forgot.

The PIT already has a very low rating and should only be used if everything
else fails.  Clock scaling for example would make the cycle counter
unusable, there might be no GIC available etc.  Otoh with SMP the PIT is
only usable as a clock source but not clock event device.

  Ralf

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc
@ 2016-04-28 16:36           ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-28 16:36 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Maciej W. Rozycki, linux-mips

[-- Attachment #1: Type: text/plain, Size: 1144 bytes --]

On Fri, Apr 22, 2016 at 09:29:07PM +0200, Ralf Baechle wrote:
> On Fri, Apr 22, 2016 at 08:23:12PM +0100, James Hogan wrote:
> 
> > >  Not everyone uses virtualisation, so it's a functional regression for 
> > > them.  Can't you lower the priority for the timer instead so that it is 
> > > not selected by default, just as it's done with other platforms providing 
> > > a choice of timers?
> > 
> > I'll look into that. Looking back at my IRC logs I suspect I meant to
> > check why the PIT was taking priority before submitting upstream, but
> > forgot.
> 
> The PIT already has a very low rating and should only be used if everything
> else fails.  Clock scaling for example would make the cycle counter
> unusable, there might be no GIC available etc.  Otoh with SMP the PIT is
> only usable as a clock source but not clock event device.

I can't even reproduce the problem any longer. Even with the PIT drivers
loaded and a message in the log about PIT clocksource it doesn't seem to
be used, so we can just drop this one anyway. I've marked this patch as
rejected in patchwork. Sorry for the noise.

Thanks
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc
@ 2016-04-28 16:36           ` James Hogan
  0 siblings, 0 replies; 17+ messages in thread
From: James Hogan @ 2016-04-28 16:36 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Maciej W. Rozycki, linux-mips

[-- Attachment #1: Type: text/plain, Size: 1144 bytes --]

On Fri, Apr 22, 2016 at 09:29:07PM +0200, Ralf Baechle wrote:
> On Fri, Apr 22, 2016 at 08:23:12PM +0100, James Hogan wrote:
> 
> > >  Not everyone uses virtualisation, so it's a functional regression for 
> > > them.  Can't you lower the priority for the timer instead so that it is 
> > > not selected by default, just as it's done with other platforms providing 
> > > a choice of timers?
> > 
> > I'll look into that. Looking back at my IRC logs I suspect I meant to
> > check why the PIT was taking priority before submitting upstream, but
> > forgot.
> 
> The PIT already has a very low rating and should only be used if everything
> else fails.  Clock scaling for example would make the cycle counter
> unusable, there might be no GIC available etc.  Otoh with SMP the PIT is
> only usable as a clock source but not clock event device.

I can't even reproduce the problem any longer. Even with the PIT drivers
loaded and a message in the log about PIT clocksource it doesn't seem to
be used, so we can just drop this one anyway. I've marked this patch as
rejected in patchwork. Sorry for the noise.

Thanks
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-04-28 16:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-22 17:19 [PATCH 0/4] MIPS: Guest timekeeping improvements James Hogan
2016-04-22 17:19 ` James Hogan
2016-04-22 17:19 ` [PATCH 1/4] MIPS: malta-time: Start GIC count before syncing to RTC James Hogan
2016-04-22 17:19   ` James Hogan
2016-04-22 17:19 ` [PATCH 2/4] MIPS: malta-time: Take seconds into account James Hogan
2016-04-22 17:19   ` James Hogan
2016-04-22 17:19 ` [PATCH 3/4] MIPS: malta-time: Don't use PIT timer for cevt/csrc James Hogan
2016-04-22 17:19   ` James Hogan
2016-04-22 18:57   ` Maciej W. Rozycki
2016-04-22 18:57     ` Maciej W. Rozycki
2016-04-22 19:23     ` James Hogan
2016-04-22 19:23       ` James Hogan
2016-04-22 19:29       ` Ralf Baechle
2016-04-28 16:36         ` James Hogan
2016-04-28 16:36           ` James Hogan
2016-04-22 17:19 ` [PATCH 4/4] MIPS: cevt-r4k: Dynamically calculate min_delta_ns James Hogan
2016-04-22 17:19   ` James Hogan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.