All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Roland Dreier <rdreier@cisco.com>,
	Masami Hiramatsu <mhiramat@redhat.com>,
	Martin Bligh <mbligh@google.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	darren@dvhart.com, "Frank Ch. Eigler" <fche@redhat.com>,
	systemtap-ml <systemtap@sources.redhat.com>
Subject: Re: Unified tracing buffer
Date: Tue, 23 Sep 2008 10:12:05 -0400	[thread overview]
Message-ID: <20080923141205.GB23185@Krystal> (raw)
In-Reply-To: <alpine.LFD.1.10.0809222106090.3265@nehalem.linux-foundation.org>

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> 
> 
> On Mon, 22 Sep 2008, Steven Rostedt wrote:
> > 
> > But, with that, with a global atomic counter, and the following trace:
> > 
> > cpu 0: trace_point_a
> > cpu 1: trace_point_c
> > cpu 0: trace_point_b
> > cpu 1: trace_point_d
> > 
> > Could the event a really come after event d, even though we already hit 
> > event b?
> 
> Each tracepoint will basically give a partial ordering (if you make it so, 
> of course - and on x86 it's hard to avoid it).
> 
> And with many trace-points, you can narrow down ordering if you're lucky.
> 
> But say that you have code like
> 
> 	CPU#1		CPU#2
> 
> 	trace_a		trace_c
> 	..		..
> 	trace_b		trace_d
> 
> and since each CPU itself is obviously strictly ordered, you a priori know 
> that a < b, and c < d. But your trace buffer can look many different ways:
> 
>  - a -> b -> c -> d
>    c -> d -> a -> b
> 
>    Now you do know that what happened between c and d must all have 
>    happened entirely after/before the things that happened between
>    a and b, and there is no overlap.
> 
>    This is only assuming the x86 full memory barrier from a "lock xadd" of 
>    course, but those are the semantics you'd get on x86. On others, the 
>    ordering might not be that strong.
> 

Hrm, Documentation/atomic_ops.txt states that :

"Unlike the above routines, it is required that explicit memory
barriers are performed before and after the operation.  It must be
done such that all memory operations before and after the atomic
operation calls are strongly ordered with respect to the atomic
operation itself."

So on architectures with weaker ordering, the kernel atomic ops already
require that explicit smp_mb() are inserted before and after the atomic
increment. The same applies to cmpxchg.

Therefore I think it's ok, given the semantic provided by these two
atomic operations, to assume they imply a smp_mb() for any given
architecture. If not, then the architecture-specific implementation
would be broken wrt the semantic.

>  - a -> c -> b -> d
>    a -> c -> d -> b
> 
>    With these trace point orderings, you really don't know anything at all 
>    about the order of any access that happened in between. CPU#1 might 
>    have gone first. Or not. Or partially. You simply do not know.
> 

Yep. If two "real kernel" events happen to belong to the same
overlapping time window, there is not much we can know about their
order. Adding tracing statements before and after traced kernel
operations could help to make this window as small as possible, but I
doubt it's worth the performance penality and event duplication (and
incremented trace size).

Mathieu


> > But I guess you are stating the fact that what the computer does 
> > internally, no one really knows. Without the help of real memory barriers, 
> > ording of memory accesses is mostly determined by tarot cards.
> 
> Well, x86 defines a memory order. But what I'm trying to explain is that 
> memory order still doesn't actually specify what happens to the code that 
> actually does tracing! The trace is only going to show the order of the 
> tracepoints, not the _other_ memory accesses. So you'll have *some* 
> information, but it's very partial.
> 
> And the thing is, all those other memory accesses are the ones that do all 
> the real work. You'll know they happened _somewhere_ between two 
> tracepoints, but not much more than that.
> 
> This is why timestamps aren't really any worse than sequence numbers in 
> all practical matters. They'll get you close enough that you can consider 
> them equivalent to a cache-coherent counter, just one that you don't have 
> to take a cache miss for, and that increments on its own!
> 
> Quite a lot of CPU's have nice, dependable, TSC's that run at constant 
> frequency. 
> 
> And quite a lot of traces care a _lot_ about real time. When you do IO 
> tracing, the problem is almost never about lock ordering or anything like 
> that. You want to see how long a request took. You don't care AT ALL how 
> many tracepoints were in between the beginning and end, you care about how 
> many microseconds there were!
> 
> 			Linus
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-09-23 14:12 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-19 21:33 Unified tracing buffer Martin Bligh
2008-09-19 21:42 ` Randy Dunlap
2008-09-19 21:57   ` Martin Bligh
2008-09-19 22:41     ` Olaf Dabrunz
2008-09-19 22:19       ` Martin Bligh
2008-09-20  8:10         ` Olaf Dabrunz
2008-09-20  8:29         ` Steven Rostedt
2008-09-20 11:40           ` Mathieu Desnoyers
2008-09-20  8:26     ` Steven Rostedt
2008-09-20 11:44       ` Mathieu Desnoyers
2008-09-19 22:28 ` Olaf Dabrunz
2008-09-19 22:09   ` Martin Bligh
2008-09-19 23:18 ` Frank Ch. Eigler
2008-09-20  8:50   ` Steven Rostedt
2008-09-20 13:37     ` Mathieu Desnoyers
2008-09-20 13:51       ` Steven Rostedt
2008-09-20 14:54         ` Steven Rostedt
2008-09-22 18:45           ` Mathieu Desnoyers
2008-09-22 21:39             ` Steven Rostedt
2008-09-23  3:27               ` Mathieu Desnoyers
2008-09-20  0:07 ` Peter Zijlstra
2008-09-22 14:07   ` K.Prasad
2008-09-22 14:45     ` Peter Zijlstra
2008-09-22 16:29       ` Martin Bligh
2008-09-22 16:36         ` Peter Zijlstra
2008-09-22 20:50           ` Masami Hiramatsu
2008-09-23  3:05           ` Mathieu Desnoyers
2008-09-23  2:49       ` Mathieu Desnoyers
2008-09-23  5:25       ` Tom Zanussi
2008-09-23  9:31         ` Peter Zijlstra
2008-09-23 18:13           ` Mathieu Desnoyers
2008-09-23 18:13             ` Mathieu Desnoyers
2008-09-23 18:33             ` Christoph Lameter
2008-09-23 18:33               ` Christoph Lameter
2008-09-23 18:56               ` Linus Torvalds
2008-09-23 18:56                 ` Linus Torvalds
2008-09-23 13:50         ` Mathieu Desnoyers
2008-09-23 14:00         ` Martin Bligh
2008-09-23 17:55           ` K.Prasad
2008-09-23 18:27             ` Martin Bligh
2008-09-24  3:50           ` Tom Zanussi
2008-09-24  5:42             ` K.Prasad
2008-09-25  6:07             ` [RFC PATCH 0/8] current relay cleanup patchset Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 1/8] relay - Clean up relay_switch_subbuf() and make waking up consumers optional Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 2/8] relay - Make the relay sub-buffer switch code replaceable Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 3/8] relay - Add channel flags to relay, remove global callback param Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 4/8] relay - Add reserved param to switch-subbuf, in preparation for non-pad write/reserve Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 5/8] relay - Map the first sub-buffer at the end of the buffer, for temporary convenience Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 6/8] relay - Replace relay_reserve/relay_write with non-padded versions Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 7/8] relay - Remove padding-related code from relay_read()/relay_splice_read() et al Tom Zanussi
2008-09-25  6:08             ` [RFC PATCH 8/8] relay - Clean up remaining padding-related junk Tom Zanussi
2008-09-23  5:27       ` [PATCH 1/3] relay - clean up subbuf switch Tom Zanussi
2008-09-23 20:15         ` Andrew Morton
2008-09-23  5:27       ` [PATCH 2/3] relay - make subbuf switch replaceable Tom Zanussi
2008-09-23 20:17         ` Andrew Morton
2008-09-23  5:27       ` [PATCH 3/3] relay - add channel flags Tom Zanussi
2008-09-23 20:20         ` Andrew Morton
2008-09-24  3:57           ` Tom Zanussi
2008-09-20  0:26 ` Unified tracing buffer Marcel Holtmann
2008-09-20  9:03 ` Steven Rostedt
2008-09-20 13:55   ` Mathieu Desnoyers
2008-09-20 14:12     ` Arjan van de Ven
2008-09-22 18:52       ` Mathieu Desnoyers
2008-10-02 15:28         ` Jason Baron
2008-10-03 16:11           ` Mathieu Desnoyers
2008-10-03 18:37             ` Jason Baron
2008-10-03 19:10               ` Mathieu Desnoyers
2008-10-03 19:25                 ` Jason Baron
2008-10-03 19:56                   ` Mathieu Desnoyers
2008-10-03 20:25                     ` Jason Baron
2008-10-03 21:52                 ` Frank Ch. Eigler
2008-09-22  3:09     ` KOSAKI Motohiro
2008-09-22  9:57   ` Peter Zijlstra
2008-09-23  2:36     ` Mathieu Desnoyers
2008-09-22 13:57 ` K.Prasad
2008-09-22 19:45 ` Masami Hiramatsu
2008-09-22 20:13   ` Martin Bligh
2008-09-22 22:25     ` Masami Hiramatsu
2008-09-22 23:11       ` Darren Hart
2008-09-23  0:04         ` Masami Hiramatsu
2008-09-22 23:16       ` Martin Bligh
2008-09-23  0:05         ` Masami Hiramatsu
2008-09-23  0:12           ` Martin Bligh
2008-09-23 14:49             ` Masami Hiramatsu
2008-09-23 15:04               ` Mathieu Desnoyers
2008-09-23 15:30                 ` Masami Hiramatsu
2008-09-23 16:01                   ` Linus Torvalds
2008-09-23 17:04                     ` Masami Hiramatsu
2008-09-23 17:30                       ` Thomas Gleixner
2008-09-23 18:59                         ` Masami Hiramatsu
2008-09-23 19:36                           ` Thomas Gleixner
2008-09-23 19:38                             ` Martin Bligh
2008-09-23 19:41                               ` Thomas Gleixner
2008-09-23 19:50                                 ` Martin Bligh
2008-09-23 20:03                                   ` Thomas Gleixner
2008-09-23 21:02                                     ` Martin Bligh
2008-09-23 20:03                             ` Masami Hiramatsu
2008-09-23 20:08                               ` Thomas Gleixner
2008-09-23 15:46               ` Linus Torvalds
2008-09-23  0:39           ` Linus Torvalds
2008-09-23  1:26             ` Roland Dreier
2008-09-23  1:39               ` Steven Rostedt
2008-09-23  2:02               ` Mathieu Desnoyers
2008-09-23  2:26                 ` Darren Hart
2008-09-23  2:31                   ` Mathieu Desnoyers
2008-09-23  3:26               ` Linus Torvalds
2008-09-23  3:36                 ` Mathieu Desnoyers
2008-09-23  4:05                   ` Linus Torvalds
2008-09-23  3:43                 ` Steven Rostedt
2008-09-23  4:10                   ` Masami Hiramatsu
2008-09-23  4:17                     ` Martin Bligh
2008-09-23 15:23                       ` Masami Hiramatsu
2008-09-23 10:53                     ` Steven Rostedt
2008-09-23  4:19                   ` Linus Torvalds
2008-09-23 14:12                     ` Mathieu Desnoyers [this message]
2008-09-23  2:30             ` Mathieu Desnoyers
2008-09-23  3:06             ` Masami Hiramatsu
2008-09-23 14:36       ` KOSAKI Motohiro
2008-09-23 15:02         ` Frank Ch. Eigler
2008-09-23 15:21         ` Masami Hiramatsu
2008-09-23 17:59           ` KOSAKI Motohiro
2008-09-23 18:28             ` Martin Bligh
2008-09-23  3:33 ` Andi Kleen
2008-09-23  3:47   ` Martin Bligh
2008-09-23  5:04     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080923141205.GB23185@Krystal \
    --to=compudj@krystal.dyndns.org \
    --cc=darren@dvhart.com \
    --cc=fche@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@google.com \
    --cc=mhiramat@redhat.com \
    --cc=rdreier@cisco.com \
    --cc=rostedt@goodmis.org \
    --cc=systemtap@sources.redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.