All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Martin Bligh <mbligh@google.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mathieu Desnoyers <compudj@krystal.dyndns.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	darren@dvhart.com, "Frank Ch. Eigler" <fche@redhat.com>,
	systemtap-ml <systemtap@sources.redhat.com>
Subject: Re: Unified tracing buffer
Date: Mon, 22 Sep 2008 17:39:29 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.1.10.0809221718100.3265@nehalem.linux-foundation.org> (raw)
In-Reply-To: <48D832B6.3010409@redhat.com>



On Mon, 22 Sep 2008, Masami Hiramatsu wrote:
> 
> Sure, atomic counter might be more expensive but accurate for ordering.

Don't be silly.

An atomic counter is no more accurate for ordering than anything else.

Why?

Because all it tells you is the ordering of the atomic increment, not of 
the caller. The atomic increment is not related to all the other ops that 
the code that you trace actually does in any shape or form, and so the 
ordering of the trace doesn't actually imply anything for the ordering of 
the operations you are tracing!

Except for a single CPU, of course, but for that case you don't need a 
sequence number either, since the ordering is entirely determined by the 
ring buffer itself.

So the counter will be more expensive (cross-cpu cache bouncing for EVERY 
SINGLE EVENT), less useful (no real meaning for people who DO want to have 
a timestamp), and it's really no more "ordered" than anything that bases 
itself on a TSC.

The fact is, you cannot order operations based on log messages unless you 
have a lock around the whole caller - absolutely _no_ amount of locking or 
atomic accesses in the log itself will guarantee ordering of the upper 
layers.

And sure, if you have locking at a higher layer, then a sequence number is 
sufficient, but on the other hand, so is a well-synchronized TSC.

So personally, I think that the optimal solution is:

 - let each ring buffer be associated with a "gettimestamp()" function, so 
   that everybody _can_ set it to something of their own. But default to 
   something sane, namely a raw TSC thing.

 - Add synchronization events to the ring buffer often enough that you can 
   make do with a _raw_ (ie unscaled) 32-bit timestamp. Possibly by simply 
   noticing when the upper 32 bits change, although you could possibly do 
   it with a heartbeat too.

 - Similarly, add a synchronization event when the TSC frequency changes.

 - Make the synchronization packet contain the full 64-bit TSC base, in 
   addition to TSC frequency info _and_ the timebase.

 - From those synchronization events, you should be able to get a very 
   accurate timestamp *after* the fact from the raw TSC numbers (ie do all 
   the scaling not when you gather the info, but when you present it), 
   even if you only spent 32 bits of TSC info on 99% of all events (an 
   just had a overflow log occasionally to get the rest of the info)

 - Most people will be _way_ happier with a timestamp that has enough 
   precision to also show ordering (assuming that the caller holds a 
   lock over the operation _including_ the tracing) than they would ever 
   be with a sequence number.

 - people who really want to can consider the incrementing counter a TSC, 
   but it will suck in so many ways that I bet it will not be very popular 
   at all. But having the option to set a special timestamp function will
   give people the option (on a per-buffer level) to make the "TSC" be a 
   simple incrementing 32-bit counter using xaddl and the upper bits 
   incrementing from a timer, but keep that as a "ok, the TSC is really 
   broken, or this architecture doesn't support any fast cycle counters at 
   all, or I really don't care about time, just sequence, and I guarantee 
   I have a single lock in all callers that makes things unambiguous"

Note the "single lock" part. It's not enough that you make any trace thing 
under a lock. They must be under the _same_ lock for all relevant events 
for you to be able to say anything about ordering. And that's actually 
pretty rare for any complex behavior.

The timestamping, btw, is likely the most important part of the whole 
logging thing. So we need to get it right. But by "right" I mean really 
really low-latency so that it's acceptable to everybody, real-time enough 
that you can tell how far apart events were, and precise enough that you 
really _can_ see ordering.

The "raw TSC value with correction information" should be able to give you 
all of that. At least on x86. On some platforms, the TSC may not give you 
enough resolution to get reasonable guesses on event ordering.

			Linus

  parent reply	other threads:[~2008-09-23  0:41 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-19 21:33 Unified tracing buffer Martin Bligh
2008-09-19 21:42 ` Randy Dunlap
2008-09-19 21:57   ` Martin Bligh
2008-09-19 22:41     ` Olaf Dabrunz
2008-09-19 22:19       ` Martin Bligh
2008-09-20  8:10         ` Olaf Dabrunz
2008-09-20  8:29         ` Steven Rostedt
2008-09-20 11:40           ` Mathieu Desnoyers
2008-09-20  8:26     ` Steven Rostedt
2008-09-20 11:44       ` Mathieu Desnoyers
2008-09-19 22:28 ` Olaf Dabrunz
2008-09-19 22:09   ` Martin Bligh
2008-09-19 23:18 ` Frank Ch. Eigler
2008-09-20  8:50   ` Steven Rostedt
2008-09-20 13:37     ` Mathieu Desnoyers
2008-09-20 13:51       ` Steven Rostedt
2008-09-20 14:54         ` Steven Rostedt
2008-09-22 18:45           ` Mathieu Desnoyers
2008-09-22 21:39             ` Steven Rostedt
2008-09-23  3:27               ` Mathieu Desnoyers
2008-09-20  0:07 ` Peter Zijlstra
2008-09-22 14:07   ` K.Prasad
2008-09-22 14:45     ` Peter Zijlstra
2008-09-22 16:29       ` Martin Bligh
2008-09-22 16:36         ` Peter Zijlstra
2008-09-22 20:50           ` Masami Hiramatsu
2008-09-23  3:05           ` Mathieu Desnoyers
2008-09-23  2:49       ` Mathieu Desnoyers
2008-09-23  5:25       ` Tom Zanussi
2008-09-23  9:31         ` Peter Zijlstra
2008-09-23 18:13           ` Mathieu Desnoyers
2008-09-23 18:13             ` Mathieu Desnoyers
2008-09-23 18:33             ` Christoph Lameter
2008-09-23 18:33               ` Christoph Lameter
2008-09-23 18:56               ` Linus Torvalds
2008-09-23 18:56                 ` Linus Torvalds
2008-09-23 13:50         ` Mathieu Desnoyers
2008-09-23 14:00         ` Martin Bligh
2008-09-23 17:55           ` K.Prasad
2008-09-23 18:27             ` Martin Bligh
2008-09-24  3:50           ` Tom Zanussi
2008-09-24  5:42             ` K.Prasad
2008-09-25  6:07             ` [RFC PATCH 0/8] current relay cleanup patchset Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 1/8] relay - Clean up relay_switch_subbuf() and make waking up consumers optional Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 2/8] relay - Make the relay sub-buffer switch code replaceable Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 3/8] relay - Add channel flags to relay, remove global callback param Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 4/8] relay - Add reserved param to switch-subbuf, in preparation for non-pad write/reserve Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 5/8] relay - Map the first sub-buffer at the end of the buffer, for temporary convenience Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 6/8] relay - Replace relay_reserve/relay_write with non-padded versions Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 7/8] relay - Remove padding-related code from relay_read()/relay_splice_read() et al Tom Zanussi
2008-09-25  6:08             ` [RFC PATCH 8/8] relay - Clean up remaining padding-related junk Tom Zanussi
2008-09-23  5:27       ` [PATCH 1/3] relay - clean up subbuf switch Tom Zanussi
2008-09-23 20:15         ` Andrew Morton
2008-09-23  5:27       ` [PATCH 2/3] relay - make subbuf switch replaceable Tom Zanussi
2008-09-23 20:17         ` Andrew Morton
2008-09-23  5:27       ` [PATCH 3/3] relay - add channel flags Tom Zanussi
2008-09-23 20:20         ` Andrew Morton
2008-09-24  3:57           ` Tom Zanussi
2008-09-20  0:26 ` Unified tracing buffer Marcel Holtmann
2008-09-20  9:03 ` Steven Rostedt
2008-09-20 13:55   ` Mathieu Desnoyers
2008-09-20 14:12     ` Arjan van de Ven
2008-09-22 18:52       ` Mathieu Desnoyers
2008-10-02 15:28         ` Jason Baron
2008-10-03 16:11           ` Mathieu Desnoyers
2008-10-03 18:37             ` Jason Baron
2008-10-03 19:10               ` Mathieu Desnoyers
2008-10-03 19:25                 ` Jason Baron
2008-10-03 19:56                   ` Mathieu Desnoyers
2008-10-03 20:25                     ` Jason Baron
2008-10-03 21:52                 ` Frank Ch. Eigler
2008-09-22  3:09     ` KOSAKI Motohiro
2008-09-22  9:57   ` Peter Zijlstra
2008-09-23  2:36     ` Mathieu Desnoyers
2008-09-22 13:57 ` K.Prasad
2008-09-22 19:45 ` Masami Hiramatsu
2008-09-22 20:13   ` Martin Bligh
2008-09-22 22:25     ` Masami Hiramatsu
2008-09-22 23:11       ` Darren Hart
2008-09-23  0:04         ` Masami Hiramatsu
2008-09-22 23:16       ` Martin Bligh
2008-09-23  0:05         ` Masami Hiramatsu
2008-09-23  0:12           ` Martin Bligh
2008-09-23 14:49             ` Masami Hiramatsu
2008-09-23 15:04               ` Mathieu Desnoyers
2008-09-23 15:30                 ` Masami Hiramatsu
2008-09-23 16:01                   ` Linus Torvalds
2008-09-23 17:04                     ` Masami Hiramatsu
2008-09-23 17:30                       ` Thomas Gleixner
2008-09-23 18:59                         ` Masami Hiramatsu
2008-09-23 19:36                           ` Thomas Gleixner
2008-09-23 19:38                             ` Martin Bligh
2008-09-23 19:41                               ` Thomas Gleixner
2008-09-23 19:50                                 ` Martin Bligh
2008-09-23 20:03                                   ` Thomas Gleixner
2008-09-23 21:02                                     ` Martin Bligh
2008-09-23 20:03                             ` Masami Hiramatsu
2008-09-23 20:08                               ` Thomas Gleixner
2008-09-23 15:46               ` Linus Torvalds
2008-09-23  0:39           ` Linus Torvalds [this message]
2008-09-23  1:26             ` Roland Dreier
2008-09-23  1:39               ` Steven Rostedt
2008-09-23  2:02               ` Mathieu Desnoyers
2008-09-23  2:26                 ` Darren Hart
2008-09-23  2:31                   ` Mathieu Desnoyers
2008-09-23  3:26               ` Linus Torvalds
2008-09-23  3:36                 ` Mathieu Desnoyers
2008-09-23  4:05                   ` Linus Torvalds
2008-09-23  3:43                 ` Steven Rostedt
2008-09-23  4:10                   ` Masami Hiramatsu
2008-09-23  4:17                     ` Martin Bligh
2008-09-23 15:23                       ` Masami Hiramatsu
2008-09-23 10:53                     ` Steven Rostedt
2008-09-23  4:19                   ` Linus Torvalds
2008-09-23 14:12                     ` Mathieu Desnoyers
2008-09-23  2:30             ` Mathieu Desnoyers
2008-09-23  3:06             ` Masami Hiramatsu
2008-09-23 14:36       ` KOSAKI Motohiro
2008-09-23 15:02         ` Frank Ch. Eigler
2008-09-23 15:21         ` Masami Hiramatsu
2008-09-23 17:59           ` KOSAKI Motohiro
2008-09-23 18:28             ` Martin Bligh
2008-09-23  3:33 ` Andi Kleen
2008-09-23  3:47   ` Martin Bligh
2008-09-23  5:04     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.1.10.0809221718100.3265@nehalem.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=compudj@krystal.dyndns.org \
    --cc=darren@dvhart.com \
    --cc=fche@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@google.com \
    --cc=mhiramat@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=systemtap@sources.redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.