From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: Martin Bligh <mbligh@google.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Mathieu Desnoyers <compudj@krystal.dyndns.org>,
Steven Rostedt <rostedt@goodmis.org>,
od@novell.com, "Frank Ch. Eigler" <fche@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
hch@lst.de, David Wilder <dwilder@us.ibm.com>,
zanussi@comcast.net
Subject: Re: Unified tracing buffer
Date: Mon, 22 Sep 2008 19:27:23 +0530 [thread overview]
Message-ID: <20080922135723.GA5279@in.ibm.com> (raw)
In-Reply-To: <33307c790809191433w246c0283l55a57c196664ce77@mail.gmail.com>
On Fri, Sep 19, 2008 at 02:33:42PM -0700, Martin Bligh wrote:
> During kernel summit and Plumbers conference, Linus and others
> expressed a desire for a unified
> tracing buffer system for multiple tracing applications (eg ftrace,
> lttng, systemtap, blktrace, etc) to use.
> This provides several advantages, including the ability to interleave
> data from multiple sources,
> not having to learn 200 different tools, duplicated code/effort, etc.
>
With due apologies for pitching-in late, I thought I'd bring visibility to
the two new interfaces - namely relay_printk() and relay_dump() - now a part
of -mm tree (since 2.6.27-rc5-mm1) are meant to address such needs;
although not completely in its present form but quite substantially.
(Refer: Documentation/filesystems/relay.txt). As far as re-usability is
concerned, many parts of this interface are directly adopted from
SystemTap's runtime. Blktrace had been made to work using these interfaces
(http://tinyurl.com/4q9d4p) reducing about ~130 lines of code from the
blktrace related files.
With more effort, say additions such as a)ability to specify custom
names for files b)ability to create user-defined control files (in
addition to what comes default) will make it usable along with tracers
such as ftrace (ref:http://tinyurl.com/3ppbwh) (and is something that I
intended to work upon).
While relay_printk() interface brings a high-level abstract interface
over 'relay' by masking all the setup/tear-down details and the ability
to use per-CPU buffers; relay_dump() is its equivalent that performs
binary dumping through debugfs interface (a requirement for the unified
tracing buffer, as I learn from the email). Also the use of default
file-names, debugfs output path results in huge reduction of setup code
required by the end-user along with the ability to override the defaults
if required in a special case. Examples of the resulting code-brevity can
be seen at samples/relay/*.c in 2.6.27-rc5-mm1 tree.
I am quite sure that with minimal changes to infrastructure underlying
beneath these two interfaces, we can meet out most of the requirements
stated above; and am open for suggestions.
Kindly let me know what the community thinks about the same.
Thanks,
K.Prasad
> Several of us got together last night and tried to cut this down to
> the simplest usable system
> we could agree on (and nobody got hurt!). This will form version 1.
> I've sketched out a few
> enhancements we know that we want, but have agreed to leave these
> until version 2.
> The answer to most questions about the below is "yes we know, we'll
> fix that in version 2"
> (or 3). Simplicity was the rule ...
>
> Sketch of design. Enjoy flaming me. Code will follow shortly.
>
>
> STORAGE
> -------
>
> We will support multiple buffers for different tracing systems, with
> separate names, event id spaces.
> Event ids are 16 bit, dynamically allocated.
> A "one line of text" print function will be provided for each event,
> or use the default (probably hex printf)
> Will provide a "flight data recorder" mode, and a "spool to disk" mode.
>
> Circular buffer per cpu, protected by per-cpu spinlock_irq
> Word aligned records.
> Variable record length, header will start with length record.
> Timestamps in fixed timebase, monotonically increasing (across all CPUs)
>
>
> INPUT_FUNCTIONS
> ---------------
>
> allocate_buffer (name, size)
> return buffer_handle
>
> register_event (buffer_handle, event_id, print_function)
> You can pass in a requested event_id from a fixed set, and
> will be given it, or an error
> 0 means allocate me one dynamically
> returns event_id (or -E_ERROR)
>
> record_event (buffer_handle, event_id, length, *buf)
>
>
> OUTPUT
> ------
>
> Data will be output via debugfs, and provide the following output streams:
>
> /debugfs/tracing/<name>/buffers/text
> clear text stream (will merge the per-cpu streams via insertion
> sort, and use the print functions)
>
> /debugfs/tracing/<name>/buffers/binary[cpu_number]
> per-cpu binary data
>
>
> CONTROL
> -------
>
> Sysfs style tree under debugfs
>
> /debugfs/tracing/<name>/buffers/enabed <--- binary value
>
> /debugfs/tracing/<name>/<event1>
> /debugfs/tracing/<name>/<event2>
> etc ...
> provides a way to enable/disable events, see what's available, and
> what's enabled.
>
>
> KNOWN ISSUES / PLANS
> -------------------
>
> No way to unregister buffers and events.
> Will provide an unregister_buffer and unregister_event call
>
>
> Generating systemwide time is hard on some platforms
> Yes. Time-based output provides a lot of simplicity for the user though
> We won't support these platforms at first, we'll add functionality
> to make it work for them later.
> (plan based on tick-based ms timing, plus counter offset from that
> if needed).
>
> Spinlock_irq is ineffecient, and doesn't support tracing in NMIs
> True. We'll implement a lockless scheme later (see lttng)
>
> Putting a length record in every event is inefficient
> True. Fixed record length with optional extensions is better, but
> more complex. v2.
>
> Putting a full timestamp rather than an offset in every event is inefficient
> See above. True, but v2.
>
> Relayfs already exists! use that!
> People were universally not keen on that idea. Complexity, interface, etc.
> We're also providing some higher level shared functions for time &
> event ids.
>
> There's no way to decode the binary data stream
> Code will be shared from the kernel to decode it, so that we can
> get the compact binary
> format and decode it later. That code will be kept in the kernel
> tree (it's a trivial piece of C).
> Version 1.1 ;-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2008-09-22 13:59 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-19 21:33 Unified tracing buffer Martin Bligh
2008-09-19 21:42 ` Randy Dunlap
2008-09-19 21:57 ` Martin Bligh
2008-09-19 22:41 ` Olaf Dabrunz
2008-09-19 22:19 ` Martin Bligh
2008-09-20 8:10 ` Olaf Dabrunz
2008-09-20 8:29 ` Steven Rostedt
2008-09-20 11:40 ` Mathieu Desnoyers
2008-09-20 8:26 ` Steven Rostedt
2008-09-20 11:44 ` Mathieu Desnoyers
2008-09-19 22:28 ` Olaf Dabrunz
2008-09-19 22:09 ` Martin Bligh
2008-09-19 23:18 ` Frank Ch. Eigler
2008-09-20 8:50 ` Steven Rostedt
2008-09-20 13:37 ` Mathieu Desnoyers
2008-09-20 13:51 ` Steven Rostedt
2008-09-20 14:54 ` Steven Rostedt
2008-09-22 18:45 ` Mathieu Desnoyers
2008-09-22 21:39 ` Steven Rostedt
2008-09-23 3:27 ` Mathieu Desnoyers
2008-09-20 0:07 ` Peter Zijlstra
2008-09-22 14:07 ` K.Prasad
2008-09-22 14:45 ` Peter Zijlstra
2008-09-22 16:29 ` Martin Bligh
2008-09-22 16:36 ` Peter Zijlstra
2008-09-22 20:50 ` Masami Hiramatsu
2008-09-23 3:05 ` Mathieu Desnoyers
2008-09-23 2:49 ` Mathieu Desnoyers
2008-09-23 5:25 ` Tom Zanussi
2008-09-23 9:31 ` Peter Zijlstra
2008-09-23 18:13 ` Mathieu Desnoyers
2008-09-23 18:13 ` Mathieu Desnoyers
2008-09-23 18:33 ` Christoph Lameter
2008-09-23 18:33 ` Christoph Lameter
2008-09-23 18:56 ` Linus Torvalds
2008-09-23 18:56 ` Linus Torvalds
2008-09-23 13:50 ` Mathieu Desnoyers
2008-09-23 14:00 ` Martin Bligh
2008-09-23 17:55 ` K.Prasad
2008-09-23 18:27 ` Martin Bligh
2008-09-24 3:50 ` Tom Zanussi
2008-09-24 5:42 ` K.Prasad
2008-09-25 6:07 ` [RFC PATCH 0/8] current relay cleanup patchset Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 1/8] relay - Clean up relay_switch_subbuf() and make waking up consumers optional Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 2/8] relay - Make the relay sub-buffer switch code replaceable Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 3/8] relay - Add channel flags to relay, remove global callback param Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 4/8] relay - Add reserved param to switch-subbuf, in preparation for non-pad write/reserve Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 5/8] relay - Map the first sub-buffer at the end of the buffer, for temporary convenience Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 6/8] relay - Replace relay_reserve/relay_write with non-padded versions Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 7/8] relay - Remove padding-related code from relay_read()/relay_splice_read() et al Tom Zanussi
2008-09-25 6:08 ` [RFC PATCH 8/8] relay - Clean up remaining padding-related junk Tom Zanussi
2008-09-23 5:27 ` [PATCH 1/3] relay - clean up subbuf switch Tom Zanussi
2008-09-23 20:15 ` Andrew Morton
2008-09-23 5:27 ` [PATCH 2/3] relay - make subbuf switch replaceable Tom Zanussi
2008-09-23 20:17 ` Andrew Morton
2008-09-23 5:27 ` [PATCH 3/3] relay - add channel flags Tom Zanussi
2008-09-23 20:20 ` Andrew Morton
2008-09-24 3:57 ` Tom Zanussi
2008-09-20 0:26 ` Unified tracing buffer Marcel Holtmann
2008-09-20 9:03 ` Steven Rostedt
2008-09-20 13:55 ` Mathieu Desnoyers
2008-09-20 14:12 ` Arjan van de Ven
2008-09-22 18:52 ` Mathieu Desnoyers
2008-10-02 15:28 ` Jason Baron
2008-10-03 16:11 ` Mathieu Desnoyers
2008-10-03 18:37 ` Jason Baron
2008-10-03 19:10 ` Mathieu Desnoyers
2008-10-03 19:25 ` Jason Baron
2008-10-03 19:56 ` Mathieu Desnoyers
2008-10-03 20:25 ` Jason Baron
2008-10-03 21:52 ` Frank Ch. Eigler
2008-09-22 3:09 ` KOSAKI Motohiro
2008-09-22 9:57 ` Peter Zijlstra
2008-09-23 2:36 ` Mathieu Desnoyers
2008-09-22 13:57 ` K.Prasad [this message]
2008-09-22 19:45 ` Masami Hiramatsu
2008-09-22 20:13 ` Martin Bligh
2008-09-22 22:25 ` Masami Hiramatsu
2008-09-22 23:11 ` Darren Hart
2008-09-23 0:04 ` Masami Hiramatsu
2008-09-22 23:16 ` Martin Bligh
2008-09-23 0:05 ` Masami Hiramatsu
2008-09-23 0:12 ` Martin Bligh
2008-09-23 14:49 ` Masami Hiramatsu
2008-09-23 15:04 ` Mathieu Desnoyers
2008-09-23 15:30 ` Masami Hiramatsu
2008-09-23 16:01 ` Linus Torvalds
2008-09-23 17:04 ` Masami Hiramatsu
2008-09-23 17:30 ` Thomas Gleixner
2008-09-23 18:59 ` Masami Hiramatsu
2008-09-23 19:36 ` Thomas Gleixner
2008-09-23 19:38 ` Martin Bligh
2008-09-23 19:41 ` Thomas Gleixner
2008-09-23 19:50 ` Martin Bligh
2008-09-23 20:03 ` Thomas Gleixner
2008-09-23 21:02 ` Martin Bligh
2008-09-23 20:03 ` Masami Hiramatsu
2008-09-23 20:08 ` Thomas Gleixner
2008-09-23 15:46 ` Linus Torvalds
2008-09-23 0:39 ` Linus Torvalds
2008-09-23 1:26 ` Roland Dreier
2008-09-23 1:39 ` Steven Rostedt
2008-09-23 2:02 ` Mathieu Desnoyers
2008-09-23 2:26 ` Darren Hart
2008-09-23 2:31 ` Mathieu Desnoyers
2008-09-23 3:26 ` Linus Torvalds
2008-09-23 3:36 ` Mathieu Desnoyers
2008-09-23 4:05 ` Linus Torvalds
2008-09-23 3:43 ` Steven Rostedt
2008-09-23 4:10 ` Masami Hiramatsu
2008-09-23 4:17 ` Martin Bligh
2008-09-23 15:23 ` Masami Hiramatsu
2008-09-23 10:53 ` Steven Rostedt
2008-09-23 4:19 ` Linus Torvalds
2008-09-23 14:12 ` Mathieu Desnoyers
2008-09-23 2:30 ` Mathieu Desnoyers
2008-09-23 3:06 ` Masami Hiramatsu
2008-09-23 14:36 ` KOSAKI Motohiro
2008-09-23 15:02 ` Frank Ch. Eigler
2008-09-23 15:21 ` Masami Hiramatsu
2008-09-23 17:59 ` KOSAKI Motohiro
2008-09-23 18:28 ` Martin Bligh
2008-09-23 3:33 ` Andi Kleen
2008-09-23 3:47 ` Martin Bligh
2008-09-23 5:04 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080922135723.GA5279@in.ibm.com \
--to=prasad@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=compudj@krystal.dyndns.org \
--cc=dwilder@us.ibm.com \
--cc=fche@redhat.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@google.com \
--cc=od@novell.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=zanussi@comcast.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.