linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Hellwig <hch@infradead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tim Bird <tim.bird@am.sony.com>, Sam Ravnborg <sam@ravnborg.org>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	John Stultz <johnstul@us.ibm.com>,
	John Stultz <johstul@us.ibm.com>,
	Steven Rostedt <srostedt@redhat.com>
Subject: [RFC PATCH 12/23 -v4] Use RCU algorithm for monotonic cycles.
Date: Mon, 21 Jan 2008 10:22:43 -0500	[thread overview]
Message-ID: <20080121152352.789802471@goodmis.org> (raw)
In-Reply-To: 20080121152231.579118762@goodmis.org

[-- Attachment #1: get-monotonic-rcu-experimental.patch --]
[-- Type: text/plain, Size: 6194 bytes --]

From: john stultz <johnstul@us.ibm.com>

On Wed, 2008-01-16 at 18:39 -0500, Mathieu Desnoyers wrote:
> I would disable preemption in clocksource_get_basecycles. We would not
> want to be scheduled out while we hold a pointer to the old array
> element.
> 
> > +	int num = cs->base_num;
> 
> Since you deal with base_num in a shared manner (not per cpu), you will
> need a smp_read_barrier_depend() here after the cs->base_num read.
> 
> You should think about reading the cs->base_num first, and _after_ that
> read the real clocksource. Here, the clocksource value is passed as
> parameter. It means that the read clocksource may have been read in the
> previous RCU window.

Here's an updated version of the patch w/ the suggested memory barrier
changes and favored (1-x) inversion change. ;)  Let me know if you see
any other holes, or have any other suggestions or ideas.

Still un-tested (my test box will free up soon, I promise!), but builds.

[ Steven Rostedt has been running this patch with no problems ]

Signed-off-by: John Stultz <johstul@us.ibm.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>

---
 include/linux/clocksource.h |   50 +++++++++++++++++++++++++++++++++++---------
 kernel/time/timekeeping.c   |   36 ++++---------------------------
 2 files changed, 45 insertions(+), 41 deletions(-)

Index: linux-mcount.git/include/linux/clocksource.h
===================================================================
--- linux-mcount.git.orig/include/linux/clocksource.h	2008-01-18 23:48:45.000000000 -0500
+++ linux-mcount.git/include/linux/clocksource.h	2008-01-18 23:48:46.000000000 -0500
@@ -87,9 +87,17 @@ struct clocksource {
 	 * more than one cache line.
 	 */
 	struct {
-		cycle_t cycle_last, cycle_accumulated, cycle_raw;
-	} ____cacheline_aligned_in_smp;
+		cycle_t cycle_last, cycle_accumulated;
 
+		/* base structure provides lock-free read
+		 * access to a virtualized 64bit counter
+		 * Uses RCU-like update.
+		 */
+		struct {
+			cycle_t cycle_base_last, cycle_base;
+		} base[2];
+		int base_num;
+	} ____cacheline_aligned_in_smp;
 	u64 xtime_nsec;
 	s64 error;
 
@@ -175,19 +183,29 @@ static inline cycle_t clocksource_read(s
 }
 
 /**
- * clocksource_get_cycles: - Access the clocksource's accumulated cycle value
+ * clocksource_get_basecycles: - get the clocksource's accumulated cycle value
  * @cs:		pointer to clocksource being read
  * @now:	current cycle value
  *
  * Uses the clocksource to return the current cycle_t value.
  * NOTE!!!: This is different from clocksource_read, because it
- * returns the accumulated cycle value! Must hold xtime lock!
+ * returns a 64bit wide accumulated value.
  */
 static inline cycle_t
-clocksource_get_cycles(struct clocksource *cs, cycle_t now)
+clocksource_get_basecycles(struct clocksource *cs)
 {
-	cycle_t offset = (now - cs->cycle_last) & cs->mask;
-	offset += cs->cycle_accumulated;
+	int num;
+	cycle_t now, offset;
+
+	preempt_disable();
+	num = cs->base_num;
+	smp_read_barrier_depends();
+	now = clocksource_read(cs);
+	offset = (now - cs->base[num].cycle_base_last);
+	offset &= cs->mask;
+	offset += cs->base[num].cycle_base;
+	preempt_enable();
+
 	return offset;
 }
 
@@ -197,14 +215,26 @@ clocksource_get_cycles(struct clocksourc
  * @now:	current cycle value
  *
  * Used to avoids clocksource hardware overflow by periodically
- * accumulating the current cycle delta. Must hold xtime write lock!
+ * accumulating the current cycle delta. Uses RCU-like update, but
+ * ***still requires the xtime_lock is held for writing!***
  */
 static inline void clocksource_accumulate(struct clocksource *cs, cycle_t now)
 {
-	cycle_t offset = (now - cs->cycle_last) & cs->mask;
+	/* First update the monotonic base portion.
+	 * The dual array update method allows for lock-free reading.
+	 */
+	int num = 1 - cs->base_num;
+	cycle_t offset = (now - cs->base[1-num].cycle_base_last);
+	offset &= cs->mask;
+	cs->base[num].cycle_base = cs->base[1-num].cycle_base + offset;
+	cs->base[num].cycle_base_last = now;
+	wmb();
+	cs->base_num = num;
+
+	/* Now update the cycle_accumulated portion */
+	offset = (now - cs->cycle_last) & cs->mask;
 	cs->cycle_last = now;
 	cs->cycle_accumulated += offset;
-	cs->cycle_raw += offset;
 }
 
 /**
Index: linux-mcount.git/kernel/time/timekeeping.c
===================================================================
--- linux-mcount.git.orig/kernel/time/timekeeping.c	2008-01-18 23:48:45.000000000 -0500
+++ linux-mcount.git/kernel/time/timekeeping.c	2008-01-18 23:48:46.000000000 -0500
@@ -71,10 +71,12 @@ static struct clocksource *clock = &cloc
  */
 static inline s64 __get_nsec_offset(void)
 {
-	cycle_t cycle_delta;
+	cycle_t now, cycle_delta;
 	s64 ns_offset;
 
-	cycle_delta = clocksource_get_cycles(clock, clocksource_read(clock));
+	now = clocksource_read(clock);
+	cycle_delta = (now - clock->cycle_last) & clock->mask;
+	cycle_delta += clock->cycle_accumulated;
 	ns_offset = cyc2ns(clock, cycle_delta);
 
 	return ns_offset;
@@ -105,35 +107,7 @@ static inline void __get_realtime_clock_
 
 cycle_t notrace get_monotonic_cycles(void)
 {
-	cycle_t cycle_now, cycle_delta, cycle_raw, cycle_last;
-
-	do {
-		/*
-		 * cycle_raw and cycle_last can change on
-		 * another CPU and we need the delta calculation
-		 * of cycle_now and cycle_last happen atomic, as well
-		 * as the adding to cycle_raw. We don't need to grab
-		 * any locks, we just keep trying until get all the
-		 * calculations together in one state.
-		 *
-		 * In fact, we __cant__ grab any locks. This
-		 * function is called from the latency_tracer which can
-		 * be called anywhere. To grab any locks (including
-		 * seq_locks) we risk putting ourselves into a deadlock.
-		 */
-		cycle_raw = clock->cycle_raw;
-		cycle_last = clock->cycle_last;
-
-		/* read clocksource: */
-		cycle_now = clocksource_read(clock);
-
-		/* calculate the delta since the last update_wall_time: */
-		cycle_delta = (cycle_now - cycle_last) & clock->mask;
-
-	} while (cycle_raw != clock->cycle_raw ||
-		 cycle_last != clock->cycle_last);
-
-	return cycle_raw + cycle_delta;
+	return clocksource_get_basecycles(clock);
 }
 
 unsigned long notrace cycles_to_usecs(cycle_t cycles)

-- 

  parent reply	other threads:[~2008-01-21 15:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-21 15:22 [RFC PATCH 00/23 -v4] mcount and latency tracing utility -v4 Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 01/23 -v4] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 02/23 -v4] Annotate core code that should not be traced Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 03/23 -v4] x86_64: notrace annotations Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 04/23 -v4] add notrace annotations to vsyscall Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 05/23 -v4] add notrace annotations for NMI routines Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 06/23 -v4] handle accurate time keeping over long delays Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 07/23 -v4] ppc clock accumulate fix Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 08/23 -v4] Fixup merge between xtime_cache and timkkeeping starvation fix Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 09/23 -v4] time keeping add cycle_raw for actual incrementation Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 10/23 -v4] initialize the clock source to jiffies clock Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 11/23 -v4] add get_monotonic_cycles Steven Rostedt
2008-01-21 15:22 ` Steven Rostedt [this message]
2008-01-22  0:20   ` [RFC PATCH 12/23 -v4] Use RCU algorithm for monotonic cycles Nick Piggin
2008-01-22  0:49     ` Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 13/23 -v4] add notrace annotations to timing events Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 14/23 -v4] mcount based trace in the form of a header file library Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 15/23 -v4] Add context switch marker to sched.c Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 16/23 -v4] Make the task State char-string visible to all Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 17/23 -v4] Add tracing of context switches Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 18/23 -v4] Generic command line storage Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 19/23 -v4] trace generic call to schedule switch Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 20/23 -v4] Add marker in try_to_wake_up Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 21/23 -v4] mcount tracer for wakeup latency timings Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 22/23 -v4] Trace irq disabled critical timings Steven Rostedt
2008-01-21 15:22 ` [RFC PATCH 23/23 -v4] trace preempt off " Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080121152352.789802471@goodmis.org \
    --to=rostedt@goodmis.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@ghostprotocols.net \
    --cc=akpm@linux-foundation.org \
    --cc=fche@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=hch@infradead.org \
    --cc=jan.kiszka@siemens.com \
    --cc=johnstul@us.ibm.com \
    --cc=johstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=sam@ravnborg.org \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tim.bird@am.sony.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).