public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <mhiramat@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Bligh <mbligh@google.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mathieu Desnoyers <compudj@krystal.dyndns.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	darren@dvhart.com, "Frank Ch. Eigler" <fche@redhat.com>,
	systemtap-ml <systemtap@sources.redhat.com>
Subject: Re: Unified tracing buffer
Date: Mon, 22 Sep 2008 23:06:09 -0400	[thread overview]
Message-ID: <48D85D21.7060801@redhat.com> (raw)
In-Reply-To: <alpine.LFD.1.10.0809221718100.3265@nehalem.linux-foundation.org>

Hi Linus,

Linus Torvalds wrote:
> 
> On Mon, 22 Sep 2008, Masami Hiramatsu wrote:
>> Sure, atomic counter might be more expensive but accurate for ordering.
> 
> Don't be silly.
> 
> An atomic counter is no more accurate for ordering than anything else.
> 
> Why?
> 
> Because all it tells you is the ordering of the atomic increment, not of 
> the caller. The atomic increment is not related to all the other ops that 
> the code that you trace actually does in any shape or form, and so the 
> ordering of the trace doesn't actually imply anything for the ordering of 
> the operations you are tracing!
> 
> Except for a single CPU, of course, but for that case you don't need a 
> sequence number either, since the ordering is entirely determined by the 
> ring buffer itself.
> 
> So the counter will be more expensive (cross-cpu cache bouncing for EVERY 
> SINGLE EVENT), less useful (no real meaning for people who DO want to have 
> a timestamp), and it's really no more "ordered" than anything that bases 
> itself on a TSC.
> 
> The fact is, you cannot order operations based on log messages unless you 
> have a lock around the whole caller - absolutely _no_ amount of locking or 
> atomic accesses in the log itself will guarantee ordering of the upper 
> layers.

Indeed.
If TSC(or similar time counter) can provide synchronized-time, I don't
have any comment on that(AFAIK, latest x86 and ia64 can provide it).
# I might be a bit nervous about Broken TSC...

> And sure, if you have locking at a higher layer, then a sequence number is 
> sufficient, but on the other hand, so is a well-synchronized TSC.
> 
> So personally, I think that the optimal solution is:
> 
>  - let each ring buffer be associated with a "gettimestamp()" function, so 
>    that everybody _can_ set it to something of their own. But default to 
>    something sane, namely a raw TSC thing.

I agree, default to TSC is enough.

>  - Add synchronization events to the ring buffer often enough that you can 
>    make do with a _raw_ (ie unscaled) 32-bit timestamp. Possibly by simply 
>    noticing when the upper 32 bits change, although you could possibly do 
>    it with a heartbeat too.
> 
>  - Similarly, add a synchronization event when the TSC frequency changes.
> 
>  - Make the synchronization packet contain the full 64-bit TSC base, in 
>    addition to TSC frequency info _and_ the timebase.
> 
>  - From those synchronization events, you should be able to get a very 
>    accurate timestamp *after* the fact from the raw TSC numbers (ie do all 
>    the scaling not when you gather the info, but when you present it), 
>    even if you only spent 32 bits of TSC info on 99% of all events (an 
>    just had a overflow log occasionally to get the rest of the info)
> 
>  - Most people will be _way_ happier with a timestamp that has enough 
>    precision to also show ordering (assuming that the caller holds a 
>    lock over the operation _including_ the tracing) than they would ever 
>    be with a sequence number.
> 
>  - people who really want to can consider the incrementing counter a TSC, 
>    but it will suck in so many ways that I bet it will not be very popular 
>    at all. But having the option to set a special timestamp function will
>    give people the option (on a per-buffer level) to make the "TSC" be a 
>    simple incrementing 32-bit counter using xaddl and the upper bits 
>    incrementing from a timer, but keep that as a "ok, the TSC is really 
>    broken, or this architecture doesn't support any fast cycle counters at 
>    all, or I really don't care about time, just sequence, and I guarantee 
>    I have a single lock in all callers that makes things unambiguous"

Thank you very much for giving me a good idea!
I agree with you.

> Note the "single lock" part. It's not enough that you make any trace thing 
> under a lock. They must be under the _same_ lock for all relevant events 
> for you to be able to say anything about ordering. And that's actually 
> pretty rare for any complex behavior.
> 
> The timestamping, btw, is likely the most important part of the whole 
> logging thing. So we need to get it right. But by "right" I mean really 
> really low-latency so that it's acceptable to everybody, real-time enough 
> that you can tell how far apart events were, and precise enough that you 
> really _can_ see ordering.
> 
> The "raw TSC value with correction information" should be able to give you 
> all of that. At least on x86. On some platforms, the TSC may not give you 
> enough resolution to get reasonable guesses on event ordering.
> 
> 			Linus

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


  parent reply	other threads:[~2008-09-23  3:07 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-19 21:33 Unified tracing buffer Martin Bligh
2008-09-19 21:42 ` Randy Dunlap
2008-09-19 21:57   ` Martin Bligh
2008-09-19 22:41     ` Olaf Dabrunz
2008-09-19 22:19       ` Martin Bligh
2008-09-20  8:10         ` Olaf Dabrunz
2008-09-20  8:29         ` Steven Rostedt
2008-09-20 11:40           ` Mathieu Desnoyers
2008-09-20  8:26     ` Steven Rostedt
2008-09-20 11:44       ` Mathieu Desnoyers
2008-09-19 22:28 ` Olaf Dabrunz
2008-09-19 22:09   ` Martin Bligh
2008-09-19 23:18 ` Frank Ch. Eigler
2008-09-20  8:50   ` Steven Rostedt
2008-09-20 13:37     ` Mathieu Desnoyers
2008-09-20 13:51       ` Steven Rostedt
2008-09-20 14:54         ` Steven Rostedt
2008-09-22 18:45           ` Mathieu Desnoyers
2008-09-22 21:39             ` Steven Rostedt
2008-09-23  3:27               ` Mathieu Desnoyers
2008-09-20  0:07 ` Peter Zijlstra
2008-09-22 14:07   ` K.Prasad
2008-09-22 14:45     ` Peter Zijlstra
2008-09-22 16:29       ` Martin Bligh
2008-09-22 16:36         ` Peter Zijlstra
2008-09-22 20:50           ` Masami Hiramatsu
2008-09-23  3:05           ` Mathieu Desnoyers
2008-09-23  2:49       ` Mathieu Desnoyers
2008-09-23  5:25       ` Tom Zanussi
2008-09-23  9:31         ` Peter Zijlstra
2008-09-23 18:13           ` Mathieu Desnoyers
2008-09-23 18:33             ` Christoph Lameter
2008-09-23 18:56               ` Linus Torvalds
2008-09-23 13:50         ` Mathieu Desnoyers
2008-09-23 14:00         ` Martin Bligh
2008-09-23 17:55           ` K.Prasad
2008-09-23 18:27             ` Martin Bligh
2008-09-24  3:50           ` Tom Zanussi
2008-09-24  5:42             ` K.Prasad
2008-09-25  6:07             ` [RFC PATCH 0/8] current relay cleanup patchset Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 1/8] relay - Clean up relay_switch_subbuf() and make waking up consumers optional Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 2/8] relay - Make the relay sub-buffer switch code replaceable Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 3/8] relay - Add channel flags to relay, remove global callback param Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 4/8] relay - Add reserved param to switch-subbuf, in preparation for non-pad write/reserve Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 5/8] relay - Map the first sub-buffer at the end of the buffer, for temporary convenience Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 6/8] relay - Replace relay_reserve/relay_write with non-padded versions Tom Zanussi
2008-09-25  6:07             ` [RFC PATCH 7/8] relay - Remove padding-related code from relay_read()/relay_splice_read() et al Tom Zanussi
2008-09-25  6:08             ` [RFC PATCH 8/8] relay - Clean up remaining padding-related junk Tom Zanussi
2008-09-23  5:27       ` [PATCH 1/3] relay - clean up subbuf switch Tom Zanussi
2008-09-23 20:15         ` Andrew Morton
2008-09-23  5:27       ` [PATCH 2/3] relay - make subbuf switch replaceable Tom Zanussi
2008-09-23 20:17         ` Andrew Morton
2008-09-23  5:27       ` [PATCH 3/3] relay - add channel flags Tom Zanussi
2008-09-23 20:20         ` Andrew Morton
2008-09-24  3:57           ` Tom Zanussi
2008-09-20  0:26 ` Unified tracing buffer Marcel Holtmann
2008-09-20  9:03 ` Steven Rostedt
2008-09-20 13:55   ` Mathieu Desnoyers
2008-09-20 14:12     ` Arjan van de Ven
2008-09-22 18:52       ` Mathieu Desnoyers
2008-10-02 15:28         ` Jason Baron
2008-10-03 16:11           ` Mathieu Desnoyers
2008-10-03 18:37             ` Jason Baron
2008-10-03 19:10               ` Mathieu Desnoyers
2008-10-03 19:25                 ` Jason Baron
2008-10-03 19:56                   ` Mathieu Desnoyers
2008-10-03 20:25                     ` Jason Baron
2008-10-03 21:52                 ` Frank Ch. Eigler
2008-09-22  3:09     ` KOSAKI Motohiro
2008-09-22  9:57   ` Peter Zijlstra
2008-09-23  2:36     ` Mathieu Desnoyers
2008-09-22 13:57 ` K.Prasad
2008-09-22 19:45 ` Masami Hiramatsu
2008-09-22 20:13   ` Martin Bligh
2008-09-22 22:25     ` Masami Hiramatsu
2008-09-22 23:11       ` Darren Hart
2008-09-23  0:04         ` Masami Hiramatsu
2008-09-22 23:16       ` Martin Bligh
2008-09-23  0:05         ` Masami Hiramatsu
2008-09-23  0:12           ` Martin Bligh
2008-09-23 14:49             ` Masami Hiramatsu
2008-09-23 15:04               ` Mathieu Desnoyers
2008-09-23 15:30                 ` Masami Hiramatsu
2008-09-23 16:01                   ` Linus Torvalds
2008-09-23 17:04                     ` Masami Hiramatsu
2008-09-23 17:30                       ` Thomas Gleixner
2008-09-23 18:59                         ` Masami Hiramatsu
2008-09-23 19:36                           ` Thomas Gleixner
2008-09-23 19:38                             ` Martin Bligh
2008-09-23 19:41                               ` Thomas Gleixner
2008-09-23 19:50                                 ` Martin Bligh
2008-09-23 20:03                                   ` Thomas Gleixner
2008-09-23 21:02                                     ` Martin Bligh
2008-09-23 20:03                             ` Masami Hiramatsu
2008-09-23 20:08                               ` Thomas Gleixner
2008-09-23 15:46               ` Linus Torvalds
2008-09-23  0:39           ` Linus Torvalds
2008-09-23  1:26             ` Roland Dreier
2008-09-23  1:39               ` Steven Rostedt
2008-09-23  2:02               ` Mathieu Desnoyers
2008-09-23  2:26                 ` Darren Hart
2008-09-23  2:31                   ` Mathieu Desnoyers
2008-09-23  3:26               ` Linus Torvalds
2008-09-23  3:36                 ` Mathieu Desnoyers
2008-09-23  4:05                   ` Linus Torvalds
2008-09-23  3:43                 ` Steven Rostedt
2008-09-23  4:10                   ` Masami Hiramatsu
2008-09-23  4:17                     ` Martin Bligh
2008-09-23 15:23                       ` Masami Hiramatsu
2008-09-23 10:53                     ` Steven Rostedt
2008-09-23  4:19                   ` Linus Torvalds
2008-09-23 14:12                     ` Mathieu Desnoyers
2008-09-23  2:30             ` Mathieu Desnoyers
2008-09-23  3:06             ` Masami Hiramatsu [this message]
2008-09-23 14:36       ` KOSAKI Motohiro
2008-09-23 15:02         ` Frank Ch. Eigler
2008-09-23 15:21         ` Masami Hiramatsu
2008-09-23 17:59           ` KOSAKI Motohiro
2008-09-23 18:28             ` Martin Bligh
2008-09-23  3:33 ` Andi Kleen
2008-09-23  3:47   ` Martin Bligh
2008-09-23  5:04     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48D85D21.7060801@redhat.com \
    --to=mhiramat@redhat.com \
    --cc=compudj@krystal.dyndns.org \
    --cc=darren@dvhart.com \
    --cc=fche@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@google.com \
    --cc=rostedt@goodmis.org \
    --cc=systemtap@sources.redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox