From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: Martin Bligh <mbligh@google.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Mathieu Desnoyers <compudj@krystal.dyndns.org>,
Steven Rostedt <rostedt@goodmis.org>,
od@novell.com, "Frank Ch. Eigler" <fche@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
hch@lst.de, David Wilder <dwilder@us.ibm.com>,
zanussi@comcast.net
Subject: Re: Unified tracing buffer
Date: Mon, 22 Sep 2008 19:27:23 +0530 [thread overview]
Message-ID: <20080922135723.GA5279@in.ibm.com> (raw)
In-Reply-To: <33307c790809191433w246c0283l55a57c196664ce77@mail.gmail.com>
On Fri, Sep 19, 2008 at 02:33:42PM -0700, Martin Bligh wrote:
> During kernel summit and Plumbers conference, Linus and others
> expressed a desire for a unified
> tracing buffer system for multiple tracing applications (eg ftrace,
> lttng, systemtap, blktrace, etc) to use.
> This provides several advantages, including the ability to interleave
> data from multiple sources,
> not having to learn 200 different tools, duplicated code/effort, etc.
>
With due apologies for pitching-in late, I thought I'd bring visibility to
the two new interfaces - namely relay_printk() and relay_dump() - now a part
of -mm tree (since 2.6.27-rc5-mm1) are meant to address such needs;
although not completely in its present form but quite substantially.
(Refer: Documentation/filesystems/relay.txt). As far as re-usability is
concerned, many parts of this interface are directly adopted from
SystemTap's runtime. Blktrace had been made to work using these interfaces
(http://tinyurl.com/4q9d4p) reducing about ~130 lines of code from the
blktrace related files.
With more effort, say additions such as a)ability to specify custom
names for files b)ability to create user-defined control files (in
addition to what comes default) will make it usable along with tracers
such as ftrace (ref:http://tinyurl.com/3ppbwh) (and is something that I
intended to work upon).
While relay_printk() interface brings a high-level abstract interface
over 'relay' by masking all the setup/tear-down details and the ability
to use per-CPU buffers; relay_dump() is its equivalent that performs
binary dumping through debugfs interface (a requirement for the unified
tracing buffer, as I learn from the email). Also the use of default
file-names, debugfs output path results in huge reduction of setup code
required by the end-user along with the ability to override the defaults
if required in a special case. Examples of the resulting code-brevity can
be seen at samples/relay/*.c in 2.6.27-rc5-mm1 tree.
I am quite sure that with minimal changes to infrastructure underlying
beneath these two interfaces, we can meet out most of the requirements
stated above; and am open for suggestions.
Kindly let me know what the community thinks about the same.
Thanks,
K.Prasad
> Several of us got together last night and tried to cut this down to
> the simplest usable system
> we could agree on (and nobody got hurt!). This will form version 1.
> I've sketched out a few
> enhancements we know that we want, but have agreed to leave these
> until version 2.
> The answer to most questions about the below is "yes we know, we'll
> fix that in version 2"
> (or 3). Simplicity was the rule ...
>
> Sketch of design. Enjoy flaming me. Code will follow shortly.
>
>
> STORAGE
> -------
>
> We will support multiple buffers for different tracing systems, with
> separate names, event id spaces.
> Event ids are 16 bit, dynamically allocated.
> A "one line of text" print function will be provided for each event,
> or use the default (probably hex printf)
> Will provide a "flight data recorder" mode, and a "spool to disk" mode.
>
> Circular buffer per cpu, protected by per-cpu spinlock_irq
> Word aligned records.
> Variable record length, header will start with length record.
> Timestamps in fixed timebase, monotonically increasing (across all CPUs)
>
>
> INPUT_FUNCTIONS
> ---------------
>
> allocate_buffer (name, size)
> return buffer_handle
>
> register_event (buffer_handle, event_id, print_function)
> You can pass in a requested event_id from a fixed set, and
> will be given it, or an error
> 0 means allocate me one dynamically
> returns event_id (or -E_ERROR)
>
> record_event (buffer_handle, event_id, length, *buf)
>
>
> OUTPUT
> ------
>
> Data will be output via debugfs, and provide the following output streams:
>
> /debugfs/tracing/<name>/buffers/text
> clear text stream (will merge the per-cpu streams via insertion
> sort, and use the print functions)
>
> /debugfs/tracing/<name>/buffers/binary[cpu_number]
> per-cpu binary data
>
>
> CONTROL
> -------
>
> Sysfs style tree under debugfs
>
> /debugfs/tracing/<name>/buffers/enabed <--- binary value
>
> /debugfs/tracing/<name>/<event1>
> /debugfs/tracing/<name>/<event2>
> etc ...
> provides a way to enable/disable events, see what's available, and
> what's enabled.
>
>
> KNOWN ISSUES / PLANS
> -------------------
>
> No way to unregister buffers and events.
> Will provide an unregister_buffer and unregister_event call
>
>
> Generating systemwide time is hard on some platforms
> Yes. Time-based output provides a lot of simplicity for the user though
> We won't support these platforms at first, we'll add functionality
> to make it work for them later.
> (plan based on tick-based ms timing, plus counter offset from that
> if needed).
>
> Spinlock_irq is ineffecient, and doesn't support tracing in NMIs
> True. We'll implement a lockless scheme later (see lttng)
>
> Putting a length record in every event is inefficient
> True. Fixed record length with optional extensions is better, but
> more complex. v2.
>
> Putting a full timestamp rather than an offset in every event is inefficient
> See above. True, but v2.
>
> Relayfs already exists! use that!
> People were universally not keen on that idea. Complexity, interface, etc.
> We're also providing some higher level shared functions for time &
> event ids.
>
> There's no way to decode the binary data stream
> Code will be shared from the kernel to decode it, so that we can
> get the compact binary
> format and decode it later. That code will be kept in the kernel
> tree (it's a trivial piece of C).
> Version 1.1 ;-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2008-09-22 13:59 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-19 21:33 Unified tracing buffer Martin Bligh
2008-09-19 21:42 ` Randy Dunlap
2008-09-19 21:57 ` Martin Bligh
2008-09-19 22:41 ` Olaf Dabrunz
2008-09-19 22:19 ` Martin Bligh
2008-09-20 8:10 ` Olaf Dabrunz
2008-09-20 8:29 ` Steven Rostedt
2008-09-20 11:40 ` Mathieu Desnoyers
2008-09-20 8:26 ` Steven Rostedt
2008-09-20 11:44 ` Mathieu Desnoyers
2008-09-19 22:28 ` Olaf Dabrunz
2008-09-19 22:09 ` Martin Bligh
2008-09-19 23:18 ` Frank Ch. Eigler
2008-09-20 8:50 ` Steven Rostedt
2008-09-20 13:37 ` Mathieu Desnoyers
2008-09-20 13:51 ` Steven Rostedt
2008-09-20 14:54 ` Steven Rostedt
2008-09-22 18:45 ` Mathieu Desnoyers
2008-09-22 21:39 ` Steven Rostedt
2008-09-23 3:27 ` Mathieu Desnoyers
2008-09-20 0:07 ` Peter Zijlstra
2008-09-22 14:07 ` K.Prasad
2008-09-22 14:45 ` Peter Zijlstra
2008-09-22 16:29 ` Martin Bligh
2008-09-22 16:36 ` Peter Zijlstra
2008-09-22 20:50 ` Masami Hiramatsu
2008-09-23 3:05 ` Mathieu Desnoyers
2008-09-23 2:49 ` Mathieu Desnoyers
2008-09-23 5:25 ` Tom Zanussi
2008-09-23 9:31 ` Peter Zijlstra
2008-09-23 18:13 ` Mathieu Desnoyers
2008-09-23 18:33 ` Christoph Lameter
2008-09-23 18:56 ` Linus Torvalds
2008-09-23 13:50 ` Mathieu Desnoyers
2008-09-23 14:00 ` Martin Bligh
2008-09-23 17:55 ` K.Prasad
2008-09-23 18:27 ` Martin Bligh
2008-09-24 3:50 ` Tom Zanussi
2008-09-24 5:42 ` K.Prasad
2008-09-25 6:07 ` [RFC PATCH 0/8] current relay cleanup patchset Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 1/8] relay - Clean up relay_switch_subbuf() and make waking up consumers optional Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 2/8] relay - Make the relay sub-buffer switch code replaceable Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 3/8] relay - Add channel flags to relay, remove global callback param Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 4/8] relay - Add reserved param to switch-subbuf, in preparation for non-pad write/reserve Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 5/8] relay - Map the first sub-buffer at the end of the buffer, for temporary convenience Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 6/8] relay - Replace relay_reserve/relay_write with non-padded versions Tom Zanussi
2008-09-25 6:07 ` [RFC PATCH 7/8] relay - Remove padding-related code from relay_read()/relay_splice_read() et al Tom Zanussi
2008-09-25 6:08 ` [RFC PATCH 8/8] relay - Clean up remaining padding-related junk Tom Zanussi
2008-09-23 5:27 ` [PATCH 1/3] relay - clean up subbuf switch Tom Zanussi
2008-09-23 20:15 ` Andrew Morton
2008-09-23 5:27 ` [PATCH 2/3] relay - make subbuf switch replaceable Tom Zanussi
2008-09-23 20:17 ` Andrew Morton
2008-09-23 5:27 ` [PATCH 3/3] relay - add channel flags Tom Zanussi
2008-09-23 20:20 ` Andrew Morton
2008-09-24 3:57 ` Tom Zanussi
2008-09-20 0:26 ` Unified tracing buffer Marcel Holtmann
2008-09-20 9:03 ` Steven Rostedt
2008-09-20 13:55 ` Mathieu Desnoyers
2008-09-20 14:12 ` Arjan van de Ven
2008-09-22 18:52 ` Mathieu Desnoyers
2008-10-02 15:28 ` Jason Baron
2008-10-03 16:11 ` Mathieu Desnoyers
2008-10-03 18:37 ` Jason Baron
2008-10-03 19:10 ` Mathieu Desnoyers
2008-10-03 19:25 ` Jason Baron
2008-10-03 19:56 ` Mathieu Desnoyers
2008-10-03 20:25 ` Jason Baron
2008-10-03 21:52 ` Frank Ch. Eigler
2008-09-22 3:09 ` KOSAKI Motohiro
2008-09-22 9:57 ` Peter Zijlstra
2008-09-23 2:36 ` Mathieu Desnoyers
2008-09-22 13:57 ` K.Prasad [this message]
2008-09-22 19:45 ` Masami Hiramatsu
2008-09-22 20:13 ` Martin Bligh
2008-09-22 22:25 ` Masami Hiramatsu
2008-09-22 23:11 ` Darren Hart
2008-09-23 0:04 ` Masami Hiramatsu
2008-09-22 23:16 ` Martin Bligh
2008-09-23 0:05 ` Masami Hiramatsu
2008-09-23 0:12 ` Martin Bligh
2008-09-23 14:49 ` Masami Hiramatsu
2008-09-23 15:04 ` Mathieu Desnoyers
2008-09-23 15:30 ` Masami Hiramatsu
2008-09-23 16:01 ` Linus Torvalds
2008-09-23 17:04 ` Masami Hiramatsu
2008-09-23 17:30 ` Thomas Gleixner
2008-09-23 18:59 ` Masami Hiramatsu
2008-09-23 19:36 ` Thomas Gleixner
2008-09-23 19:38 ` Martin Bligh
2008-09-23 19:41 ` Thomas Gleixner
2008-09-23 19:50 ` Martin Bligh
2008-09-23 20:03 ` Thomas Gleixner
2008-09-23 21:02 ` Martin Bligh
2008-09-23 20:03 ` Masami Hiramatsu
2008-09-23 20:08 ` Thomas Gleixner
2008-09-23 15:46 ` Linus Torvalds
2008-09-23 0:39 ` Linus Torvalds
2008-09-23 1:26 ` Roland Dreier
2008-09-23 1:39 ` Steven Rostedt
2008-09-23 2:02 ` Mathieu Desnoyers
2008-09-23 2:26 ` Darren Hart
2008-09-23 2:31 ` Mathieu Desnoyers
2008-09-23 3:26 ` Linus Torvalds
2008-09-23 3:36 ` Mathieu Desnoyers
2008-09-23 4:05 ` Linus Torvalds
2008-09-23 3:43 ` Steven Rostedt
2008-09-23 4:10 ` Masami Hiramatsu
2008-09-23 4:17 ` Martin Bligh
2008-09-23 15:23 ` Masami Hiramatsu
2008-09-23 10:53 ` Steven Rostedt
2008-09-23 4:19 ` Linus Torvalds
2008-09-23 14:12 ` Mathieu Desnoyers
2008-09-23 2:30 ` Mathieu Desnoyers
2008-09-23 3:06 ` Masami Hiramatsu
2008-09-23 14:36 ` KOSAKI Motohiro
2008-09-23 15:02 ` Frank Ch. Eigler
2008-09-23 15:21 ` Masami Hiramatsu
2008-09-23 17:59 ` KOSAKI Motohiro
2008-09-23 18:28 ` Martin Bligh
2008-09-23 3:33 ` Andi Kleen
2008-09-23 3:47 ` Martin Bligh
2008-09-23 5:04 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080922135723.GA5279@in.ibm.com \
--to=prasad@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=compudj@krystal.dyndns.org \
--cc=dwilder@us.ibm.com \
--cc=fche@redhat.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@google.com \
--cc=od@novell.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=zanussi@comcast.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox