All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Hellwig <hch@infradead.org>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tim Bird <tim.bird@am.sony.com>, Sam Ravnborg <sam@ravnborg.org>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Steven Rostedt <srostedt@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	Daniel Walker <dwalker@mvista.com>
Subject: Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles
Date: Wed, 16 Jan 2008 09:56:04 -0500	[thread overview]
Message-ID: <20080116145604.GB31329@Krystal> (raw)
In-Reply-To: <Pine.LNX.4.58.0801152238130.19680@gandalf.stny.rr.com>

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> [ CC'd Daniel Walker, since he had problems with this code ]
> 
> On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
> >
> > I agree with you that I don't see how the compiler could reorder this.
> > So we forget about compiler barriers. Also, the clock source used is a
> > synchronized clock source (get_cycles_sync on x86_64), so it should make
> > sure the TSC is read at the right moment.
> >
> > However, what happens if the clock source is, say, the jiffies ?
> >
> > Is this case, we have :
> >
> > static cycle_t jiffies_read(void)
> > {
> >         return (cycle_t) jiffies;
> > }
> >
> > Which is nothing more than a memory read of
> >
> > extern unsigned long volatile __jiffy_data jiffies;
> 
> Yep, and that's not my concern.
> 

Hrm, I will reply to the rest of this email in a separate mail, but
there is another concern, simpler than memory ordering, that just hit
me :

If we have CPU A calling clocksource_accumulate while CPU B is calling
get_monotonic_cycles, but events happens in the following order (because
of preemption or interrupts). Here, to make things worse, we would be on
x86 where cycle_t is not an atomic write (64 bits) :


CPU A                  CPU B

clocksource read
update cycle_mono (1st 32 bits)
                       read cycle_mono
                       read cycle_last
                       clocksource read
                       read cycle_mono
                       read cycle_last
update cycle_mono (2nd 32 bits)
update cycle_last
update cycle_acc

Therefore, we have :
- an inconsistant cycle_monotonic value
- inconsistant cycle_monotonic and cycle_last values.

Or is there something I have missed ?

If you really want an seqlock free algorithm (I _do_ want this for
tracing!) :) maybe going in the RCU direction could help (I refer to my
RCU-based 32-to-64 bits lockless timestamp counter extension, which
could be turned into the clocksource updater).

Mathieu

> >
> > I think it is wrong to assume that reads from clock->cycle_raw and from
> > jiffies will be ordered correctly in SMP. I am tempted to think that
> > ordering memory writes to clock->cycle_raw vs jiffies is also needed in this
> > case (where clock->cycle_raw is updated, or where jiffies is updated).
> >
> > We can fall in the same kind of issue if we read the HPET, which is
> > memory I/O based. It does not seems correct to assume that MMIO vs
> > normal memory reads are ordered. (pointing back to this article :
> > http://lwn.net/Articles/198988/)
> 
> That and the dread memory barrier thread that my head is still spinning
> on.
> 
> Ok, lets take a close look at the code in question. I may be wrong, and if
> so, great, we can fix it.
> 
> We have this in get_monotonic_cycles:
> 
> {
> 	cycle_t cycle_now, cycle_delta, cycle_monotonic, cycle_last;
> 	do {
> 		cycle_monotonic = clock->cycle_monotonic;
> 		cycle_last = clock->cycle_last;
> 		cycle_now = clocksource_read(clock);
> 		cycle_delta = (cycle_now - cycle_last) & clock->mask;
> 	} while (cycle_monotonic != clock->cycle_monotonic ||
> 		 cycle_last != clock->cycle_last);
> 	return cycle_monotonic + cycle_delta;
> }
> 
> and this in clocksource.h
> 
> static inline void clocksource_accumulate(struct clocksource *cs, cycle_t now)
> {
> 	cycle_t offset = (now - cs->cycle_last) & cs->mask;
> 	cs->cycle_last = now;
> 	cs->cycle_accumulated += offset;
> 	cs->cycle_monotonic += offset;
> }
> 
> now is usually just a clocksource_read() passed in.
> 
> The goal is to have clock_monotonic always return something that is
> greater than what was read the last time.
> 
> Let's make a few assumptions now (for others to shoot them down). One
> thing is that we don't need to worry too much about MMIO, because we are
> doing a read. This means we need the data right now to contiune. So the
> read being a function call should keep gcc from moving stuff around, and
> since we are doing an IO read, the order of events should be pretty much
> synchronized. in
> 
>     1. load cycle_last and cycle_monotonic (we don't care which order)*
>     2. read clock source
>     3. calculate delta and while() compare (order doesn't matter)
> 
> * we might care (see below)
> 
> If the above is incorrect, then we need to fix get_monotonic_cycles.
> 
> in clocksource_accumulate, we have:
> 
>   offset = ((now = cs->read()) - cycle_last) & cs->mask;
>   cycle_last = now;
>   cycle_accumulate += offset;
>   cycle_monotonic += offset;
> 
> The order of events here are. Using the same reasoning as above, the read
> must be first and completed because for gcc it's a function, and for IO,
> it needs to return data.
> 
>   1. cs->read
>   2. update cycle_last, cycle_accumulate, cycle_monotonic.
> 
> Can we assume, if the above for get_monotonic_cycles is correct, that
> since we read and compare cycle_last and cycle_monotonic, that neither of
> them have changed over the read? So we have a snapshot of the
> clocksource_accumulate.
> 
> So the worst thing that I can think of, is that cycle_monotonic is update
> *before* cycle_last:
> 
>    cycle_monotonic += offest;
>      <get_monotonic_cycles run on other CPU>
>    cycle_last = now;
> 
> 
> cycle_last = 5
> cycle_monotonic = 0
> 
> 
>     CPU 0                         CPU 1
>   ----------                  -------------
>  cs->read() = 10
>  offset = 10 - 5 = 5
>  cycle_monotonic = 5
>                             cycle_monotonic = 5
>                             cycle_last = 5
>                             cs->read() = 11
>                             delta = 11 - 5 = 6
>                             cycle_monotonic and cycle_last still same
>                             return 5 + 6 = 11
> 
>   cycle_last = 10
> 
>                             cycle_monotonic = 5
>                             cycle_last = 10
>                             cs->read() = 12
>                             delta = 12 - 10 = 2
>                             cycle_monotonic and cycle_last still same
>                             return 5 + 2 = 7
> 
>                            **** ERROR *****
> 
> So, we *do* need memory barriers.  Looks like cycle_last and
> cycle_monotonic need to be synchronized.
> 
> OK, will this do?
> 
> cycle_t notrace get_monotonic_cycles(void)
> {
>         cycle_t cycle_now, cycle_delta, cycle_monotonic, cycle_last;
>         do {
>                 cycle_monotonic = clock->cycle_monotonic;
> 		smp_rmb();
>                 cycle_last = clock->cycle_last;
>                 cycle_now = clocksource_read(clock);
>                 cycle_delta = (cycle_now - cycle_last) & clock->mask;
>         } while (cycle_monotonic != clock->cycle_monotonic ||
>                  cycle_last != clock->cycle_last);
>         return cycle_monotonic + cycle_delta;
> }
> 
> and this in clocksource.h
> 
> static inline void clocksource_accumulate(struct clocksource *cs, cycle_t now)
> {
>         cycle_t offset = (now - cs->cycle_last) & cs->mask;
>         cs->cycle_last = now;
> 	smp_wmb();
>         cs->cycle_accumulated += offset;
>         cs->cycle_monotonic += offset;
> }
> 
> We may still get to a situation where cycle_monotonic is of the old value
> and cycle_last is of the new value. That would give us a smaller delta
> than we want.
> 
> Lets look at this, with a slightly different situation.
> 
> cycle_last = 5
> cycle_monotonic = 0
> 
> 
>     CPU 0                         CPU 1
>   ----------                  -------------
>  cs->read() = 10
>  offset = 10 - 5 = 5
>  cycle_last = 10
>  cycle_monotonic = 5
> 
>                             cycle_monotonic = 5
>                             cycle_last = 10
>                             cs->read() = 12
>                             delta = 12 - 10 = 2
>                             cycle_monotonic and cycle_last still same
>                             return 5 + 2 = 7
> 
> 
>  cs->read() = 13
>  offset = 13 - 10 = 2
>  cycle_last = 13
> 
>                             cycle_monotonic = 5
>                             cycle_last = 13
>                             cs->read() = 14
>                             delta = 14 - 13 = 1
>                             cycle_monotonic and cycle_last still same
>                             return 5 + 1 = 6
> 
>                         **** ERROR ****
> 
> Crap, looks like we do need a stronger locking here :-(
> 
> Hmm, I might as well just use seq_locks, and make sure that tracing
> does not hit them.
> 
> Thanks!
> 
> -- Steve
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-01-16 14:56 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09 23:29 [RFC PATCH 00/22 -v2] mcount and latency tracing utility -v2 Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 01/22 -v2] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-10 18:19   ` Jan Kiszka
2008-01-10 19:54     ` Steven Rostedt
2008-01-10 23:02     ` Steven Rostedt
2008-01-10 18:28   ` Sam Ravnborg
2008-01-10 19:10     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 02/22 -v2] Annotate core code that should not be traced Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 03/22 -v2] x86_64: notrace annotations Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 04/22 -v2] add notrace annotations to vsyscall Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 05/22 -v2] add notrace annotations for NMI routines Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 06/22 -v2] mcount based trace in the form of a header file library Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 07/22 -v2] tracer add debugfs interface Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 08/22 -v2] mcount tracer output file Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 09/22 -v2] mcount tracer show task comm and pid Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 10/22 -v2] Add a symbol only trace output Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 11/22 -v2] Reset the tracer when started Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 12/22 -v2] separate out the percpu date into a percpu struct Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays Steven Rostedt
2008-01-10  0:00   ` john stultz
2008-01-10  0:09     ` Steven Rostedt
2008-01-10 19:54     ` Tony Luck
2008-01-10 20:15       ` Steven Rostedt
2008-01-10 20:41         ` john stultz
2008-01-10 20:29       ` john stultz
2008-01-10 20:42         ` Mathieu Desnoyers
2008-01-10 21:25           ` john stultz
2008-01-10 22:00             ` Mathieu Desnoyers
2008-01-10 22:40               ` Steven Rostedt
2008-01-10 22:51               ` john stultz
2008-01-10 23:05                 ` john stultz
2008-01-10 21:33         ` [RFC PATCH 13/22 -v2] handle accurate time keeping over longdelays Luck, Tony
2008-01-10  0:19   ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays john stultz
2008-01-10  0:25     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 14/22 -v2] time keeping add cycle_raw for actual incrementation Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 15/22 -v2] initialize the clock source to jiffies clock Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 16/22 -v2] add get_monotonic_cycles Steven Rostedt
2008-01-10  3:28   ` Daniel Walker
2008-01-15 21:46   ` Mathieu Desnoyers
2008-01-15 22:01     ` Steven Rostedt
2008-01-15 22:03       ` Steven Rostedt
2008-01-15 22:08       ` Mathieu Desnoyers
2008-01-16  1:38         ` Steven Rostedt
2008-01-16  3:17           ` Mathieu Desnoyers
2008-01-16 13:17             ` Steven Rostedt
2008-01-16 14:56               ` Mathieu Desnoyers [this message]
2008-01-16 15:06                 ` Steven Rostedt
2008-01-16 15:28                   ` Mathieu Desnoyers
2008-01-16 15:58                     ` Steven Rostedt
2008-01-16 17:00                       ` Mathieu Desnoyers
2008-01-16 17:49                         ` Mathieu Desnoyers
2008-01-16 19:43                         ` Steven Rostedt
2008-01-16 20:17                           ` Mathieu Desnoyers
2008-01-16 20:45                             ` Tim Bird
2008-01-16 20:49                             ` Steven Rostedt
2008-01-17 20:08                             ` Steven Rostedt
2008-01-17 20:37                               ` Frank Ch. Eigler
2008-01-17 21:03                                 ` Steven Rostedt
2008-01-18 22:26                                   ` Mathieu Desnoyers
2008-01-18 22:49                                     ` Steven Rostedt
2008-01-18 23:19                                       ` Mathieu Desnoyers
2008-01-19  3:36                                         ` Frank Ch. Eigler
2008-01-19  3:55                                           ` Steven Rostedt
2008-01-19  4:23                                             ` Frank Ch. Eigler
2008-01-19 15:29                                               ` Mathieu Desnoyers
2008-01-19  3:32                                       ` Frank Ch. Eigler
2008-01-16 18:01                       ` Tim Bird
2008-01-16 22:36                 ` john stultz
2008-01-16 22:51                   ` john stultz
2008-01-16 23:33                     ` Steven Rostedt
2008-01-17  2:28                       ` john stultz
2008-01-17  2:40                         ` Mathieu Desnoyers
2008-01-17  2:50                           ` Mathieu Desnoyers
2008-01-17  3:02                             ` Steven Rostedt
2008-01-17  3:21                             ` Paul Mackerras
2008-01-17  3:39                               ` Steven Rostedt
2008-01-17  4:22                                 ` Mathieu Desnoyers
2008-01-17  4:25                                 ` Mathieu Desnoyers
2008-01-17  4:14                               ` Mathieu Desnoyers
2008-01-17 15:22                                 ` Steven Rostedt
2008-01-17 17:46                                 ` Linus Torvalds
2008-01-17  2:51                           ` Steven Rostedt
2008-01-16 23:39                     ` Mathieu Desnoyers
2008-01-16 23:50                       ` Steven Rostedt
2008-01-17  0:36                         ` Steven Rostedt
2008-01-17  0:33                       ` john stultz
2008-01-17  2:20                         ` Mathieu Desnoyers
2008-01-17  1:03                       ` Linus Torvalds
2008-01-17  1:35                         ` Mathieu Desnoyers
2008-01-17  2:20                       ` john stultz
2008-01-17  2:35                         ` Mathieu Desnoyers
2008-01-09 23:29 ` [RFC PATCH 17/22 -v2] Add timestamps to tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 18/22 -v2] Sort trace by timestamp Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 19/22 -v2] speed up the output of the tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 20/22 -v2] Add latency_trace format tor tracer Steven Rostedt
2008-01-10  3:41   ` Daniel Walker
2008-01-09 23:29 ` [RFC PATCH 21/22 -v2] Split out specific tracing functions Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 22/22 -v2] Trace irq disabled critical timings Steven Rostedt
2008-01-10  3:58   ` Daniel Walker
2008-01-10 14:45     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080116145604.GB31329@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@ghostprotocols.net \
    --cc=akpm@linux-foundation.org \
    --cc=dwalker@mvista.com \
    --cc=fche@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    --cc=sam@ravnborg.org \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tim.bird@am.sony.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.