From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Nicolas Pitre <nico@cam.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
torvalds@linux-foundation.org, rmk+lkml@arm.linux.org.uk,
dhowells@redhat.com, mingo@elte.hu, a.p.zijlstra@chello.nl,
linux-kernel@vger.kernel.org, ralf@linux-mips.org,
benh@kernel.crashing.org, paulus@samba.org, davem@davemloft.net,
mingo@redhat.com, tglx@linutronix.de, rostedt@goodmis.org,
linux-arch@vger.kernel.org
Subject: [PATCH] convert cnt32_to_63 to inline
Date: Tue, 11 Nov 2008 13:28:00 -0500 [thread overview]
Message-ID: <20081111182759.GA8052@Krystal> (raw)
In-Reply-To: <alpine.LFD.2.00.0811101917450.13034@xanadu.home>
* Nicolas Pitre (nico@cam.org) wrote:
> On Mon, 10 Nov 2008, Andrew Morton wrote:
>
> > On Mon, 10 Nov 2008 18:15:32 -0500 (EST)
> > Nicolas Pitre <nico@cam.org> wrote:
> >
> > > On Mon, 10 Nov 2008, Andrew Morton wrote:
> > >
> > > > This references its second argument twice, which can cause correctness
> > > > or efficiency problems.
> > > >
> > > > There is no reason that this had to be implemented in cpp.
> > > > Implementing it in C will fix the above problem.
> > >
> > > No, it won't, for correctness and efficiency reasons.
> > >
> > > And I've explained why already.
> >
> > I'd be very surprised if you've really found a case where a macro is
> > faster than an inlined function. I don't think that has happened
> > before.
>
> That hasn't anything to do with "a macro is faster" at all. It's all
> about the order used to evaluate provided arguments. And the first one
> might be anything like a memory value, an IO operation, an expression,
> etc. An inline function would work correctly with pointers only and
> therefore totally break apart on x86 for example.
>
>
> Nicolas
Let's see what it gives once implemented. Only compile-tested. Assumes
pxa, sa110 and mn10300 are all UP-only. Correct smp_rmb() are used for
arm versatile.
Turn cnt32_to_63 into an inline function.
Change all callers to new API.
Document barrier usage.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Nicolas Pitre <nico@cam.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: torvalds@linux-foundation.org
CC: rmk+lkml@arm.linux.org.uk
CC: dhowells@redhat.com
CC: paulus@samba.org
CC: a.p.zijlstra@chello.nl
CC: mingo@elte.hu
CC: benh@kernel.crashing.org
CC: rostedt@goodmis.org
CC: tglx@linutronix.de
CC: davem@davemloft.net
CC: ralf@linux-mips.org
---
arch/arm/mach-pxa/time.c | 14 ++++++++++++-
arch/arm/mach-sa1100/generic.c | 15 +++++++++++++-
arch/arm/mach-versatile/core.c | 12 ++++++++++-
arch/mn10300/kernel/time.c | 19 +++++++++++++-----
include/linux/cnt32_to_63.h | 42 +++++++++++++++++++++++++++++------------
5 files changed, 82 insertions(+), 20 deletions(-)
Index: linux.trees.git/arch/arm/mach-pxa/time.c
===================================================================
--- linux.trees.git.orig/arch/arm/mach-pxa/time.c 2008-11-11 12:20:42.000000000 -0500
+++ linux.trees.git/arch/arm/mach-pxa/time.c 2008-11-11 13:05:01.000000000 -0500
@@ -37,6 +37,10 @@
#define OSCR2NS_SCALE_FACTOR 10
static unsigned long oscr2ns_scale;
+static u32 sched_clock_cnt_hi; /*
+ * Shared cnt_hi OK with cycle counter only
+ * for UP systems.
+ */
static void __init set_oscr2ns_scale(unsigned long oscr_rate)
{
@@ -54,7 +58,15 @@ static void __init set_oscr2ns_scale(uns
unsigned long long sched_clock(void)
{
- unsigned long long v = cnt32_to_63(OSCR);
+ u32 cnt_lo, cnt_hi;
+ unsigned long long v;
+
+ preempt_disable_notrace();
+ cnt_hi = sched_clock_cnt_hi;
+ barrier(); /* read cnt_hi before cnt_lo */
+ cnt_lo = OSCR;
+ v = cnt32_to_63(cnt_hi, cnt_lo, &sched_clock_cnt_hi);
+ preempt_enable_notrace();
return (v * oscr2ns_scale) >> OSCR2NS_SCALE_FACTOR;
}
Index: linux.trees.git/include/linux/cnt32_to_63.h
===================================================================
--- linux.trees.git.orig/include/linux/cnt32_to_63.h 2008-11-11 12:20:17.000000000 -0500
+++ linux.trees.git/include/linux/cnt32_to_63.h 2008-11-11 13:10:44.000000000 -0500
@@ -32,7 +32,9 @@ union cnt32_to_63 {
/**
* cnt32_to_63 - Expand a 32-bit counter to a 63-bit counter
- * @cnt_lo: The low part of the counter
+ * @cnt_hi: The high part of the counter (read first)
+ * @cnt_lo: The low part of the counter (read after cnt_hi)
+ * @cnt_hi_ptr: Pointer to high part of the counter
*
* Many hardware clock counters are only 32 bits wide and therefore have
* a relatively short period making wrap-arounds rather frequent. This
@@ -57,7 +59,10 @@ union cnt32_to_63 {
* code must be executed at least once per each half period of the 32-bit
* counter to properly update the state bit in memory. This is usually not a
* problem in practice, but if it is then a kernel timer could be scheduled
- * to manage for this code to be executed often enough.
+ * to manage for this code to be executed often enough. If a per-cpu cnt_hi is
+ * used, the value must be updated at least once per 32-bits half-period on each
+ * CPU. If cnt_hi is shared between CPUs, it suffice to update it once per
+ * 32-bits half-period on any CPU.
*
* Note that the top bit (bit 63) in the returned value should be considered
* as garbage. It is not cleared here because callers are likely to use a
@@ -65,16 +70,29 @@ union cnt32_to_63 {
* implicitly by making the multiplier even, therefore saving on a runtime
* clear-bit instruction. Otherwise caller must remember to clear the top
* bit explicitly.
+ *
+ * Preemption must be disabled when reading the cnt_hi and cnt_lo values and
+ * calling this function.
+ *
+ * The cnt_hi parameter _must_ be read before cnt_lo. This implies using the
+ * proper barriers :
+ * - smp_rmb() if cnt_lo is read from mmio and the cnt_hi variable is shared
+ * across CPUs.
+ * - use a per-cpu variable for cnt_high if cnt_lo is read from per-cpu cycles
+ * counters or to read the counters with only a barrier().
*/
-#define cnt32_to_63(cnt_lo) \
-({ \
- static volatile u32 __m_cnt_hi; \
- union cnt32_to_63 __x; \
- __x.hi = __m_cnt_hi; \
- __x.lo = (cnt_lo); \
- if (unlikely((s32)(__x.hi ^ __x.lo) < 0)) \
- __m_cnt_hi = __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31); \
- __x.val; \
-})
+static inline u64 cnt32_to_63(u32 cnt_hi, u32 cnt_lo, u32 *cnt_hi_ptr)
+{
+ union cnt32_to_63 __x = {
+ {
+ .hi = cnt_hi,
+ .lo = cnt_lo,
+ },
+ };
+
+ if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
+ *cnt_hi_ptr = __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);
+ return __x.val; /* Remember to clear the top bit in the caller */
+}
#endif
Index: linux.trees.git/arch/arm/mach-sa1100/generic.c
===================================================================
--- linux.trees.git.orig/arch/arm/mach-sa1100/generic.c 2008-11-11 12:20:42.000000000 -0500
+++ linux.trees.git/arch/arm/mach-sa1100/generic.c 2008-11-11 13:05:10.000000000 -0500
@@ -34,6 +34,11 @@
unsigned int reset_status;
EXPORT_SYMBOL(reset_status);
+static u32 sched_clock_cnt_hi; /*
+ * Shared cnt_hi OK with cycle counter only
+ * for UP systems.
+ */
+
#define NR_FREQS 16
/*
@@ -133,7 +138,15 @@ EXPORT_SYMBOL(cpufreq_get);
*/
unsigned long long sched_clock(void)
{
- unsigned long long v = cnt32_to_63(OSCR);
+ u32 cnt_lo, cnt_hi;
+ unsigned long long v;
+
+ preempt_disable_notrace();
+ cnt_hi = sched_clock_cnt_hi;
+ barrier(); /* read cnt_hi before cnt_lo */
+ cnt_lo = OSCR;
+ v = cnt32_to_63(cnt_hi, cnt_lo, &sched_clock_cnt_hi);
+ preempt_enable_notrace();
/* the <<1 gets rid of the cnt_32_to_63 top bit saving on a bic insn */
v *= 78125<<1;
Index: linux.trees.git/arch/arm/mach-versatile/core.c
===================================================================
--- linux.trees.git.orig/arch/arm/mach-versatile/core.c 2008-11-11 12:20:42.000000000 -0500
+++ linux.trees.git/arch/arm/mach-versatile/core.c 2008-11-11 12:57:55.000000000 -0500
@@ -60,6 +60,8 @@
#define VA_VIC_BASE __io_address(VERSATILE_VIC_BASE)
#define VA_SIC_BASE __io_address(VERSATILE_SIC_BASE)
+static u32 sched_clock_cnt_hi;
+
static void sic_mask_irq(unsigned int irq)
{
irq -= IRQ_SIC_START;
@@ -238,7 +240,15 @@ void __init versatile_map_io(void)
*/
unsigned long long sched_clock(void)
{
- unsigned long long v = cnt32_to_63(readl(VERSATILE_REFCOUNTER));
+ u32 cnt_lo, cnt_hi;
+ unsigned long long v;
+
+ preempt_disable_notrace();
+ cnt_hi = sched_clock_cnt_hi;
+ smp_rmb();
+ cnt_lo = readl(VERSATILE_REFCOUNTER);
+ v = cnt32_to_63(cnt_hi, cnt_lo, &sched_clock_cnt_hi);
+ preempt_enable_notrace();
/* the <<1 gets rid of the cnt_32_to_63 top bit saving on a bic insn */
v *= 125<<1;
Index: linux.trees.git/arch/mn10300/kernel/time.c
===================================================================
--- linux.trees.git.orig/arch/mn10300/kernel/time.c 2008-11-11 12:41:42.000000000 -0500
+++ linux.trees.git/arch/mn10300/kernel/time.c 2008-11-11 13:04:42.000000000 -0500
@@ -29,6 +29,11 @@ unsigned long mn10300_iobclk; /* system
unsigned long mn10300_tsc_per_HZ; /* number of ioclks per jiffy */
#endif /* CONFIG_MN10300_RTC */
+static u32 sched_clock_cnt_hi; /*
+ * shared cnt_hi OK with cycle counter only
+ * for UP systems.
+ */
+
static unsigned long mn10300_last_tsc; /* time-stamp counter at last time
* interrupt occurred */
@@ -52,18 +57,22 @@ unsigned long long sched_clock(void)
unsigned long long ll;
unsigned l[2];
} tsc64, result;
- unsigned long tsc, tmp;
+ unsigned long tmp;
unsigned product[3]; /* 96-bit intermediate value */
+ u32 cnt_lo, cnt_hi;
- /* read the TSC value
- */
- tsc = 0 - get_cycles(); /* get_cycles() counts down */
+ preempt_disable_notrace();
+ cnt_hi = sched_clock_cnt_hi;
+ barrier(); /* read cnt_hi before cnt_lo */
+ cnt_lo = 0 - get_cycles(); /* get_cycles() counts down */
/* expand to 64-bits.
* - sched_clock() must be called once a minute or better or the
* following will go horribly wrong - see cnt32_to_63()
*/
- tsc64.ll = cnt32_to_63(tsc) & 0x7fffffffffffffffULL;
+ tsc64.ll = cnt32_to_63(cnt_hi, cnt_lo, &sched_clock_cnt_hi);
+ tsc64.ll &= 0x7fffffffffffffffULL;
+ preempt_enable_notrace();
/* scale the 64-bit TSC value to a nanosecond value via a 96-bit
* intermediate
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2008-11-11 18:28 UTC|newest]
Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-07 5:23 [RFC patch 00/18] Trace Clock v2 Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 01/18] get_cycles() : kconfig HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 02/18] get_cycles() : x86 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 03/18] get_cycles() : sparc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 04/18] get_cycles() : powerpc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07 14:56 ` Josh Boyer
2008-11-07 18:14 ` Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 05/18] get_cycles() : MIPS HAVE_GET_CYCLES_32 Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 06/18] Trace clock generic Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 07/18] Trace clock core Mathieu Desnoyers
2008-11-07 5:52 ` Andrew Morton
2008-11-07 6:16 ` Mathieu Desnoyers
2008-11-07 6:26 ` Andrew Morton
2008-11-07 16:12 ` Mathieu Desnoyers
2008-11-07 16:19 ` Andrew Morton
2008-11-07 18:16 ` Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 08/18] cnt32_to_63 should use smp_rmb() Mathieu Desnoyers
2008-11-07 6:05 ` Andrew Morton
2008-11-07 8:12 ` Nicolas Pitre
2008-11-07 8:38 ` Andrew Morton
2008-11-07 11:20 ` David Howells
2008-11-07 15:01 ` Nicolas Pitre
2008-11-07 15:50 ` Andrew Morton
2008-11-07 16:21 ` David Howells
2008-11-07 16:29 ` Andrew Morton
2008-11-07 17:10 ` David Howells
2008-11-07 17:26 ` Andrew Morton
2008-11-07 18:00 ` Mathieu Desnoyers
2008-11-07 18:21 ` Andrew Morton
2008-11-07 18:30 ` Harvey Harrison
2008-11-07 18:42 ` Mathieu Desnoyers
2008-11-07 18:33 ` Mathieu Desnoyers
2008-11-07 18:36 ` Linus Torvalds
2008-11-07 18:45 ` Andrew Morton
2008-11-07 16:47 ` Nicolas Pitre
2008-11-07 16:55 ` David Howells
2008-11-07 17:21 ` Andrew Morton
2008-11-07 20:03 ` Nicolas Pitre
2008-11-07 16:07 ` David Howells
2008-11-07 16:47 ` Mathieu Desnoyers
2008-11-07 17:04 ` David Howells
2008-11-07 17:17 ` Mathieu Desnoyers
2008-11-07 23:27 ` David Howells
2008-11-07 20:11 ` Russell King
2008-11-07 21:36 ` Mathieu Desnoyers
2008-11-07 22:18 ` Russell King
2008-11-07 22:36 ` Mathieu Desnoyers
2008-11-07 23:41 ` David Howells
2008-11-08 0:15 ` Russell King
2008-11-08 0:45 ` David Howells
2008-11-08 15:24 ` Nicolas Pitre
2008-11-08 23:20 ` [PATCH] clarify usage expectations for cnt32_to_63() Nicolas Pitre
2008-11-09 2:25 ` Mathieu Desnoyers
2008-11-09 2:54 ` Nicolas Pitre
2008-11-09 5:06 ` Nicolas Pitre
2008-11-09 5:27 ` [PATCH v2] " Nicolas Pitre
2008-11-09 6:48 ` Mathieu Desnoyers
2008-11-09 13:34 ` Nicolas Pitre
2008-11-09 13:43 ` Russell King
2008-11-09 16:22 ` Mathieu Desnoyers
2008-11-10 4:20 ` Nicolas Pitre
2008-11-10 4:42 ` Andrew Morton
2008-11-10 21:34 ` Nicolas Pitre
2008-11-10 21:58 ` Andrew Morton
2008-11-10 23:15 ` Nicolas Pitre
2008-11-10 23:22 ` Andrew Morton
2008-11-10 23:38 ` Steven Rostedt
2008-11-11 0:26 ` Nicolas Pitre
2008-11-11 18:28 ` Mathieu Desnoyers [this message]
2008-11-11 19:13 ` [PATCH] convert cnt32_to_63 to inline Russell King
2008-11-11 20:11 ` Mathieu Desnoyers
2008-11-11 21:51 ` Russell King
2008-11-12 3:48 ` Mathieu Desnoyers
2008-11-11 21:00 ` Nicolas Pitre
2008-11-11 21:13 ` Russell King
2008-11-11 22:31 ` David Howells
2008-11-11 22:37 ` Peter Zijlstra
2008-11-12 1:13 ` Steven Rostedt
2008-11-07 11:03 ` [RFC patch 08/18] cnt32_to_63 should use smp_rmb() David Howells
2008-11-07 16:51 ` Mathieu Desnoyers
2008-11-07 20:18 ` Nicolas Pitre
2008-11-07 23:55 ` David Howells
2008-11-07 10:59 ` David Howells
2008-11-07 10:55 ` David Howells
2008-11-07 17:09 ` Mathieu Desnoyers
2008-11-07 17:33 ` Steven Rostedt
2008-11-07 19:18 ` Mathieu Desnoyers
2008-11-07 19:32 ` Peter Zijlstra
2008-11-07 20:02 ` Mathieu Desnoyers
2008-11-07 20:45 ` Mathieu Desnoyers
2008-11-07 20:54 ` Paul E. McKenney
2008-11-07 21:04 ` Steven Rostedt
2008-11-08 0:34 ` Paul E. McKenney
2008-11-07 21:16 ` Mathieu Desnoyers
2008-11-07 23:50 ` David Howells
2008-11-08 0:55 ` Steven Rostedt
2008-11-09 11:51 ` David Howells
2008-11-09 14:31 ` Steven Rostedt
2008-11-09 16:18 ` Mathieu Desnoyers
2008-11-07 20:08 ` Steven Rostedt
2008-11-07 20:55 ` Paul E. McKenney
2008-11-07 21:27 ` Mathieu Desnoyers
2008-11-07 20:36 ` Nicolas Pitre
2008-11-07 20:55 ` Mathieu Desnoyers
2008-11-07 21:22 ` Nicolas Pitre
2008-11-07 5:23 ` [RFC patch 09/18] Powerpc : Trace clock Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 10/18] Sparc64 " Mathieu Desnoyers
2008-11-07 5:45 ` David Miller
2008-11-07 5:23 ` [RFC patch 11/18] LTTng timestamp sh Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 12/18] LTTng - TSC synchronicity test Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 13/18] x86 : remove arch-specific tsc_sync.c Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 14/18] MIPS use tsc_sync.c Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 15/18] MIPS : export hpt frequency for trace_clock Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 16/18] MIPS create empty sync_core() Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 17/18] MIPS : Trace clock Mathieu Desnoyers
2008-11-07 11:53 ` Peter Zijlstra
2008-11-07 17:44 ` Mathieu Desnoyers
2008-11-07 5:23 ` [RFC patch 18/18] x86 trace clock Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081111182759.GA8052@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=nico@cam.org \
--cc=paulus@samba.org \
--cc=ralf@linux-mips.org \
--cc=rmk+lkml@arm.linux.org.uk \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox