From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753148AbYIWEGz (ORCPT ); Tue, 23 Sep 2008 00:06:55 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750738AbYIWEGr (ORCPT ); Tue, 23 Sep 2008 00:06:47 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:54709 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750720AbYIWEGq (ORCPT ); Tue, 23 Sep 2008 00:06:46 -0400 Date: Mon, 22 Sep 2008 21:05:20 -0700 (PDT) From: Linus Torvalds To: Mathieu Desnoyers cc: Roland Dreier , Masami Hiramatsu , Martin Bligh , Linux Kernel Mailing List , Thomas Gleixner , Steven Rostedt , darren@dvhart.com, "Frank Ch. Eigler" , systemtap-ml Subject: Re: Unified tracing buffer In-Reply-To: <20080923033635.GK24937@Krystal> Message-ID: References: <33307c790809191433w246c0283l55a57c196664ce77@mail.gmail.com> <48D7F5E8.3000705@redhat.com> <33307c790809221313s3532d851g7239c212bc72fe71@mail.gmail.com> <48D81B5F.2030702@redhat.com> <33307c790809221616h5e7410f5gc37c262d83722111@mail.gmail.com> <48D832B6.3010409@redhat.com> <20080923033635.GK24937@Krystal> User-Agent: Alpine 1.10 (LFD 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 22 Sep 2008, Mathieu Desnoyers wrote: > > Unless I am missing something, in the case we use an atomic operation > which implies memory barriers (cmpxchg and atomic_add_return does), one > can be sure that all memory operations done before the barrier are > completed at the barrier and that all memory ops following the barrier > will happen after. Sure (if you have a barrier - not all architectures will imply that for an incrment). But that still doesn't mean a thing. You have two events (a) and (b), and you put trace-points on each. In your trace, you see (a) before (b) by comparing the numbers. But what does that mean? The actual event that you traced is not the trace-point - the trace-point is more like a fancy "printk". And the fact that one showed up before another in the trace buffer, doesn't mean that the events _around_ the trace happened in the same order. You can use the barriers to make a partial ordering, and if you have a separate tracepoint for entry into a region and exit, you can perhaps show that they were totally disjoint. Or maybe they were partially overlapping, and you'll never know exactly how they overlapped. Example: trace(..); do_X(); being executed on two different CPU's. In the trace, CPU#1 was before CPU#2. Does that mean that "do_X()" happened first on CPU#1? No. The only way to show that would be to put a lock around the whole trace _and_ operation X, ie spin_lock(..); trace(..); do_X(); spin_unlock(..); and now, if CPU#1 shows up in the trace first, then you know that do_X() really did happen first on CPU#1. Otherwise you basically know *nothing*, and the ordering of the trace events was totally and utterly meaningless. See? Trace events themselves may be ordered, but the point of the trace event is never to know the ordering of the trace itself - it's to know the ordering of the code we're interested in tracing. The ordering of the trace events themselves is irrelevant and not useful. And I'd rather see people _understand_ that, than if they think the ordering is somehow something they can trust. Btw, if you _do_ have locking, then you can also know that the "do_X()" operations will be essentially as far apart in some theoretical notion of "time" (let's imagine that we do have global time, even if we don't) as the cost of the trace operation and do_X() itself. So if we _do_ have locking (and thus a valid ordering that actually can matter), then the TSC doesn't even have to be synchronized on a cycle basis across CPU's - it just needs to be close enough that you can tell which one happened first (and with ordering, that's a valid thing to do). So you don't even need "perfect" synchronization, you just need something reasonably close, and you'll be able to see ordering from TSC counts without having that horrible bouncing cross-CPU thing that will impact performance a lot. Quite frankly, I suspect that anybody who wants to have a global counter might as well almost just have a global ring-buffer. The trace events aren't going to be CPU-local anyway if you need to always update a shared cacheline - and you might as well make the shared cacheline be the ring buffer head with a spinlock in it. That may not be _quite_ true, but it's probably close enough. Linus