* [PATCH v3 1/3] MIPS: add a common mips_cyc2ns()
2010-04-10 6:49 [PATCH v3 0/3] add high resolution sched_clock() for MIPS Wu Zhangjin
@ 2010-04-10 6:49 ` Wu Zhangjin
2010-04-10 6:49 ` [PATCH v3 2/3] MIPS: cavium-octeon: rewrite the sched_clock() based on mips_cyc2ns() Wu Zhangjin
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Wu Zhangjin @ 2010-04-10 6:49 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Wu Zhangjin, David Daney, Ralf Rösch, linux-mips
From: Wu Zhangjin <wuzhangjin@gmail.com>
Changes:
v2 -> v3:
o use 32bit instead of 64bit for mult and shift as the 'struct
clocksource' does, which saves several instructions for the 32bit
version of mips_cyc2ns().
o removes the 'easy way' of 128bit arithmatic for it not work with
some compilers. (feedback from David)
v1 -> v2:
o change the old mips_sched_clock() to mips_cyc2ns() and modify the
arguments to support 32bit.
o add 32bit support: use a smaller shift to avoid the quick overflow
of 64bit arithmatic and balance the overhead of the 128bit arithmatic
and the precision lost with the smaller shift.
----------------------
Because the high resolution sched_clock() for r4k has the same overflow
problem and solution mentioned in "MIPS: Octeon: Use non-overflowing
arithmetic in sched_clock".
"With typical mult and shift values, the calculation for Octeon's
sched_clock overflows when using 64-bit arithmetic. Use 128-bit
calculations instead."
To reduce the duplication, This patch abstracts the solution into an
inline funciton mips_cyc2ns() into arch/mips/include/asm/time.h from
arch/mips/cavium-octeon/csrc-octeon.c.
Two patches for Cavium and R4K will be sent out respectively to use this
common function.
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
---
arch/mips/include/asm/time.h | 34 ++++++++++++++++++++++++++++++++++
1 files changed, 34 insertions(+), 0 deletions(-)
diff --git a/arch/mips/include/asm/time.h b/arch/mips/include/asm/time.h
index c7f1bfe..f0ee643 100644
--- a/arch/mips/include/asm/time.h
+++ b/arch/mips/include/asm/time.h
@@ -96,4 +96,38 @@ static inline void clockevent_set_clock(struct clock_event_device *cd,
clockevents_calc_mult_shift(cd, clock, 4);
}
+static inline unsigned long long mips_cyc2ns(u64 cyc, u32 __mult, u32 __shift)
+{
+#ifdef CONFIG_32BIT
+ /*
+ * To balance the overhead of 128bit-arithematic and the precision
+ * lost, we choose a smaller shift to avoid the quick overflow as the
+ * X86 & ARM does. please refer to arch/x86/kernel/tsc.c and
+ * arch/arm/plat-orion/time.c
+ */
+ return (cyc * __mult) >> __shift;
+#else /* CONFIG_64BIT */
+ /* 64-bit arithmatic can overflow, so use 128-bit */
+ u64 t1, t2, t3;
+ unsigned long long rv;
+ u64 mult, shift;
+ mult = __mult;
+ shift = __shift;
+
+ asm (
+ "dmultu\t%[cyc],%[mult]\n\t"
+ "nor\t%[t1],$0,%[shift]\n\t"
+ "mfhi\t%[t2]\n\t"
+ "mflo\t%[t3]\n\t"
+ "dsll\t%[t2],%[t2],1\n\t"
+ "dsrlv\t%[rv],%[t3],%[shift]\n\t"
+ "dsllv\t%[t1],%[t2],%[t1]\n\t"
+ "or\t%[rv],%[t1],%[rv]\n\t"
+ : [rv] "=&r" (rv), [t1] "=&r" (t1), [t2] "=&r" (t2), [t3] "=&r" (t3)
+ : [cyc] "r" (cyc), [mult] "r" (mult), [shift] "r" (shift)
+ : "hi", "lo");
+ return rv;
+#endif
+}
+
#endif /* _ASM_TIME_H */
--
1.7.0.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v3 2/3] MIPS: cavium-octeon: rewrite the sched_clock() based on mips_cyc2ns()
2010-04-10 6:49 [PATCH v3 0/3] add high resolution sched_clock() for MIPS Wu Zhangjin
2010-04-10 6:49 ` [PATCH v3 1/3] MIPS: add a common mips_cyc2ns() Wu Zhangjin
@ 2010-04-10 6:49 ` Wu Zhangjin
2010-04-10 6:49 ` [PATCH v3 3/3] MIPS: r4k: Add a high resolution sched_clock() Wu Zhangjin
2010-04-10 10:55 ` [PATCH v3 0/3] add high resolution sched_clock() for MIPS Ralf Roesch
3 siblings, 0 replies; 5+ messages in thread
From: Wu Zhangjin @ 2010-04-10 6:49 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Wu Zhangjin, David Daney, Ralf Rösch, linux-mips
From: Wu Zhangjin <wuzhangjin@gmail.com>
Changes from v1:
o use the new interface mips_cyc2ns() intead of the old
mips_sched_clock().
The commit "MIPS: add a common mips_cyc2ns()" have abstracted the
solution of the 64bit calculation's overflow problem into a common
mips_cyc2ns() function in arch/mips/include/asm/time.h, This patch just
rewrites the sched_clock() for cavium-octeon on it.
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
---
arch/mips/cavium-octeon/csrc-octeon.c | 29 ++---------------------------
1 files changed, 2 insertions(+), 27 deletions(-)
diff --git a/arch/mips/cavium-octeon/csrc-octeon.c b/arch/mips/cavium-octeon/csrc-octeon.c
index 0bf4bbe..bca0004 100644
--- a/arch/mips/cavium-octeon/csrc-octeon.c
+++ b/arch/mips/cavium-octeon/csrc-octeon.c
@@ -52,34 +52,9 @@ static struct clocksource clocksource_mips = {
unsigned long long notrace sched_clock(void)
{
- /* 64-bit arithmatic can overflow, so use 128-bit. */
-#if (__GNUC__ < 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ <= 3))
- u64 t1, t2, t3;
- unsigned long long rv;
- u64 mult = clocksource_mips.mult;
- u64 shift = clocksource_mips.shift;
- u64 cnt = read_c0_cvmcount();
+ u64 cyc = read_c0_cvmcount();
- asm (
- "dmultu\t%[cnt],%[mult]\n\t"
- "nor\t%[t1],$0,%[shift]\n\t"
- "mfhi\t%[t2]\n\t"
- "mflo\t%[t3]\n\t"
- "dsll\t%[t2],%[t2],1\n\t"
- "dsrlv\t%[rv],%[t3],%[shift]\n\t"
- "dsllv\t%[t1],%[t2],%[t1]\n\t"
- "or\t%[rv],%[t1],%[rv]\n\t"
- : [rv] "=&r" (rv), [t1] "=&r" (t1), [t2] "=&r" (t2), [t3] "=&r" (t3)
- : [cnt] "r" (cnt), [mult] "r" (mult), [shift] "r" (shift)
- : "hi", "lo");
- return rv;
-#else
- /* GCC > 4.3 do it the easy way. */
- unsigned int __attribute__((mode(TI))) t;
- t = read_c0_cvmcount();
- t = t * clocksource_mips.mult;
- return (unsigned long long)(t >> clocksource_mips.shift);
-#endif
+ return mips_cyc2ns(cyc, clocksource_mips.mult, clocksource_mips.shift);
}
void __init plat_time_init(void)
--
1.7.0.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v3 3/3] MIPS: r4k: Add a high resolution sched_clock()
2010-04-10 6:49 [PATCH v3 0/3] add high resolution sched_clock() for MIPS Wu Zhangjin
2010-04-10 6:49 ` [PATCH v3 1/3] MIPS: add a common mips_cyc2ns() Wu Zhangjin
2010-04-10 6:49 ` [PATCH v3 2/3] MIPS: cavium-octeon: rewrite the sched_clock() based on mips_cyc2ns() Wu Zhangjin
@ 2010-04-10 6:49 ` Wu Zhangjin
2010-04-10 10:55 ` [PATCH v3 0/3] add high resolution sched_clock() for MIPS Ralf Roesch
3 siblings, 0 replies; 5+ messages in thread
From: Wu Zhangjin @ 2010-04-10 6:49 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Wu Zhangjin, David Daney, Ralf Rösch, linux-mips
From: Wu Zhangjin <wuzhangjin@gmail.com>
(v10 -> v11:
o uses 32bit instead of 64bit for mult and shift for the new
mips_cyc2ns().
o choose a smaller scaling factor: 8 to ensure it overflows slower.
With 8, if the clock frequency is 400 MHz, it will overflow after 12509
hours(about 521 days) which is enough for generic debugging(i.e. Ftrace).
o annotate the cnt32_to_63_keepwarm() with notrace.
v9 -> v10:
o use the new interface mips_cyc2ns() instead of the old
mips_sched_clock()
o adds 32bit support via using a smaller shift to balance the overhead
of 128bit arithmatic and the precision lost. please refer to the method
used in X86 & ARM platforms, arch/x86/kernel/tsc.c,
arch/arm/plat-orion/time.c.
v8 -> v9:
O Make it depends on 64BIT for the current mips_cyc2ns() only
support 64bit currently.
v7 -> v8:
O Make it works with the exisiting clocksource_mips.mult,
clocksource_mips.shift and copes with the 64bit calculation's overflow
problem with the method introduced by David Daney in "MIPS: Octeon: Use
non-overflowing arithmetic in sched_clock".
To reduce the duplication, I have abstracted an inline
mips_cyc2ns() function to arch/mips/include/asm/time.h from
arch/mips/cavium-octeon/csrc-octeon.c.
v6 -> v7:
O Make it depends on !CPU_FREQ and CPU_HAS_FIXED_C0_COUNT
This sched_clock() is only available with the processor has fixed cp0
MIPS count register or even has dynamic cp0 MIPS count register but
with CPU_FREQ disabled.
NOTE: If your processor has fixed c0 count, please select
CPU_HAS_FIXED_C0_COUNT for it and send a related patch to Ralf.
v5 -> v6:
o hard-codes the cycle2ns_scale_factor as 8 for 30(cs->shift) is too
big. With 30, the return value of sched_clock() will also overflow quickly.
o moves the sched_clock() back into csrc-r4k.c as David and Sergei
recommended.
o inits c0 count as zero for PRINTK_TIME=y.
o drops the HR_SCHED_CLCOK option for the current sched_clock() is stable
enough to replace the jiffies based one.
)
This patch adds a cnt32_to_63() and MIPS c0 count based sched_clock(),
which provides high resolution.
Without it, the Ftrace for MIPS will give useless timestamp information.
Because cnt32_to_63() needs to be called at least once per half period
to work properly, Differ from the old version, this v2 revision set up a
kernel timer to ensure the requirement of some MIPSs which have short c0
count period.
And also, we init the c0 count as ZERO(just as jiffies does) in
time_init() before plat_time_init(), without it, PRINTK_TIME=y will get
wrong timestamp information. (NOTE: some platforms have initiazlied c0
count as zero, but some not, this may introduce some duplication,
perhaps a new patch is needed to remove the initialized of c0 count in
the platforms later?)
This is originally from arch/arm/plat-orion/time.c
This revision works well for function graph tracer now, and also,
PRINTK_TIME=y will get normal timestamp informatin.
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
---
arch/mips/Kconfig | 12 +++++++
arch/mips/kernel/csrc-r4k.c | 76 +++++++++++++++++++++++++++++++++++++++++++
arch/mips/kernel/time.c | 5 +++
3 files changed, 93 insertions(+), 0 deletions(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index f2ead53..b302838 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1962,6 +1962,18 @@ config NR_CPUS
source "kernel/time/Kconfig"
#
+# High Resolution sched_clock() support
+#
+
+config CPU_HAS_FIXED_C0_COUNT
+ bool
+
+config CPU_SUPPORTS_HR_SCHED_CLOCK
+ bool
+ depends on CPU_HAS_FIXED_C0_COUNT || !CPU_FREQ
+ default y
+
+#
# Timer Interrupt Frequency Configuration
#
diff --git a/arch/mips/kernel/csrc-r4k.c b/arch/mips/kernel/csrc-r4k.c
index e95a3cd..92870cb 100644
--- a/arch/mips/kernel/csrc-r4k.c
+++ b/arch/mips/kernel/csrc-r4k.c
@@ -6,7 +6,9 @@
* Copyright (C) 2007 by Ralf Baechle
*/
#include <linux/clocksource.h>
+#include <linux/cnt32_to_63.h>
#include <linux/init.h>
+#include <linux/timer.h>
#include <asm/time.h>
@@ -22,6 +24,78 @@ static struct clocksource clocksource_mips = {
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
+#ifdef CONFIG_CPU_SUPPORTS_HR_SCHED_CLOCK
+/*
+ * MIPS sched_clock implementation.
+ *
+ * Because the hardware timer period is quite short and because cnt32_to_63()
+ * needs to be called at least once per half period to work properly, a kernel
+ * timer is set up to ensure this requirement is always met.
+ *
+ * Please refer to include/linux/cnt32_to_63.h, arch/arm/plat-orion/time.c and
+ * arch/mips/include/asm/time.h (mips_cyc2ns)
+ */
+
+#define CYC2NS_SHIFT 8
+static u32 mult __read_mostly;
+static u32 shift __read_mostly;
+
+unsigned long long notrace sched_clock(void)
+{
+ u64 cyc = cnt32_to_63(read_c0_count());
+
+#ifdef CONFIG_64BIT
+ /* For we have used 128bit arithmatic to cope with the overflow
+ * problem, the method to clear the top bit with an event value doesn't
+ * work now, therefore, clear it at run-time is needed.
+ */
+ if (cyc & 0x8000000000000000)
+ cyc &= 0x7fffffffffffffff;
+#endif
+ return mips_cyc2ns(cyc, mult, shift);
+}
+
+static struct timer_list cnt32_to_63_keepwarm_timer;
+
+static void notrace cnt32_to_63_keepwarm(unsigned long data)
+{
+ mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + data));
+ sched_clock();
+}
+#endif
+
+static inline void setup_hres_sched_clock(unsigned long clock)
+{
+#ifdef CONFIG_CPU_SUPPORTS_HR_SCHED_CLOCK
+ unsigned long data;
+
+#ifdef CONFIG_32BIT
+ unsigned long long v;
+
+ v = NSEC_PER_SEC;
+ v <<= CYC2NS_SHIFT;
+ v += clock/2;
+ do_div(v, clock);
+ mult = v;
+ shift = CYC2NS_SHIFT;
+ /*
+ * We want an even value to automatically clear the top bit
+ * returned by cnt32_to_63() without an additional run time
+ * instruction. So if the LSB is 1 then round it up.
+ */
+ if (mult & 1)
+ mult++;
+#else
+ mult = clocksource_mips.mult;
+ shift = clocksource_mips.shift;
+#endif
+
+ data = 0x80000000UL / clock * HZ;
+ setup_timer(&cnt32_to_63_keepwarm_timer, cnt32_to_63_keepwarm, data);
+ mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + data));
+#endif
+}
+
int __init init_r4k_clocksource(void)
{
if (!cpu_has_counter || !mips_hpt_frequency)
@@ -32,6 +106,8 @@ int __init init_r4k_clocksource(void)
clocksource_set_clock(&clocksource_mips, mips_hpt_frequency);
+ setup_hres_sched_clock(mips_hpt_frequency);
+
clocksource_register(&clocksource_mips);
return 0;
diff --git a/arch/mips/kernel/time.c b/arch/mips/kernel/time.c
index fb74974..86cf18a 100644
--- a/arch/mips/kernel/time.c
+++ b/arch/mips/kernel/time.c
@@ -119,6 +119,11 @@ static __init int cpu_has_mfc0_count_bug(void)
void __init time_init(void)
{
+#ifdef CONFIG_CPU_SUPPORTS_HR_SCHED_CLOCK
+ if (!mips_clockevent_init() || !cpu_has_mfc0_count_bug())
+ write_c0_count(0);
+#endif
+
plat_time_init();
if (!mips_clockevent_init() || !cpu_has_mfc0_count_bug())
--
1.7.0.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH v3 0/3] add high resolution sched_clock() for MIPS
2010-04-10 6:49 [PATCH v3 0/3] add high resolution sched_clock() for MIPS Wu Zhangjin
` (2 preceding siblings ...)
2010-04-10 6:49 ` [PATCH v3 3/3] MIPS: r4k: Add a high resolution sched_clock() Wu Zhangjin
@ 2010-04-10 10:55 ` Ralf Roesch
3 siblings, 0 replies; 5+ messages in thread
From: Ralf Roesch @ 2010-04-10 10:55 UTC (permalink / raw)
To: Wu Zhangjin; +Cc: Ralf Baechle, David Daney, linux-mips
I applied your patch set against tip/rt/2.6.33 (kernel.org) and it works
fine on our TX4938 based Fieldbuscontroller which uses the r4k-based
timer clocksource. Thanks!
(32bit version)
tested-by: Ralf Roesch <ralf.roesch@rw-gmbh.de>
On Sat Apr 10 2010 08:49:56 GMT+0200 (CET), Wu Zhangjin
<wuzhangjin@gmail.com> wrote:
> From: Wu Zhangjin<wuzhangjin@gmail.com>
>
> Hi, Ralf, hi David.
>
> I have tested it again in the 32bit and 64bit kernel on a Yeeloong netbook,
> both of them work well. so, it should be applicable now.
>
> BTW:
>
> to David, if the first two patches are ok for you, could you give a
> "Acked-by:"? thanks!
>
> to Ralf Rösch, does this 32bit version work for you? If yes, welcome your
> tested-by:, thanks ;)
>
> ----------------
>
> Changes:
>
> v2 -> v3:
>
> o remove the 'easy way' of 128bit arithmatic of mips_cyc2ns().
> o use 32bit type instead of 64bit for the input arguments(mult and shift) as
> the 'struct clocksource' does.
> o use a smaller scaling factor: 8, with this factor, if the clock frequency
> is 400MHz, it will overflow after about 521 days.
>
> v1 -> v2:
>
> o Adds 32bit support, using a smaller scaling factor(shift) to avoid 128bit
> arithmatic, of course, it loses some precision.
>
> o Adds the testing results of the overhead of sched_clock() in 64bit kernel
>
> Clock func/overhead(ns) Min Avg Max Jitter Std.Dev.
> ----------------------------------------------
> sched_clock(cnt32_to_63) 105 116.2 236 131 9.5
> getnstimeofday() 160 167.1 437 277 15
> ----------------------------------------------
>
> As we can see, the cnt32_to_63() based sched_clock() have lower overhead.
>
> ----------------
>
> This patchset adds a high resolution version of sched_clock() for the r4k MIPS.
>
> The generic sched_clock() is jiffies based and has very bad resolution(1ms with
> HZ set as 1000), this one is based on the r4k c0 count, the resolution reaches
> about several ns(2.5ns with 400M clock frequency).
>
> To cope with the overflow problem of the 32bit c0 count, based on the
> cnt32_to_63() method in include/linux/cnt32_to_63.h. we have converted the
> 32bit counter to a virtual 63bit counter.
>
> And to fix the overflow problem of the 64bit arithmatic(cycles * mult) in 64bit
> kernel, we use the 128bit arithmatic contributed by David, but for 32bit
> kernel, to balance the overhead of 128bit arithmatic and the precision lost, we
> choose the method used in X86(arch/x86/kernel/tsc.c) and
> ARM(arch/arm/plat-orion/time.c): just use a smaller scaling factor and do 64bit
> arithmatic, of course, it will also overflow but not that quickly.
>
> Regards,
> Wu Zhangjin
>
> Wu Zhangjin (3):
> MIPS: add a common mips_cyc2ns()
> MIPS: cavium-octeon: rewrite the sched_clock() based on mips_cyc2ns()
> MIPS: r4k: Add a high resolution sched_clock()
>
> arch/mips/Kconfig | 12 +++++
> arch/mips/cavium-octeon/csrc-octeon.c | 29 +------------
> arch/mips/include/asm/time.h | 34 +++++++++++++++
> arch/mips/kernel/csrc-r4k.c | 76 +++++++++++++++++++++++++++++++++
> arch/mips/kernel/time.c | 5 ++
> 5 files changed, 129 insertions(+), 27 deletions(-)
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread