* [PATCH 1/2] Trace code and documentation
@ 2007-09-13 23:43 David Wilder
2007-09-14 4:12 ` Randy Dunlap
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: David Wilder @ 2007-09-13 23:43 UTC (permalink / raw)
To: linux-kernel, SystemTAP; +Cc: akpm
[-- Attachment #1: Type: text/plain, Size: 1 bytes --]
[-- Attachment #2: trace.patch --]
[-- Type: text/x-patch, Size: 26940 bytes --]
Trace - Provides tracing primitives
Tom Zanussi <zanussi@us.ibm.com>
Martin Hunt <hunt@redhat.com>
David Wilder <dwilder@us.ibm.com>
---
Documentation/trace.txt | 297 ++++++++++++++++++++++++
include/linux/trace.h | 99 ++++++++
lib/Kconfig | 10 +
lib/Makefile | 2 +
lib/trace.c | 575 +++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 983 insertions(+), 0 deletions(-)
diff --git a/Documentation/trace.txt b/Documentation/trace.txt
new file mode 100644
index 0000000..57a5c71
--- /dev/null
+++ b/Documentation/trace.txt
@@ -0,0 +1,297 @@
+Trace Setup and Control
+=======================
+In the kernel, the trace interface provides a simple mechanism for
+starting and managing data channels (traces) to user space. The
+trace interface builds on the relay interface. For a complete
+description of the relay interface, please see:
+Documentation/filesystems/relay.txt.
+
+The trace interface provides a single layer in a complete tracing
+application. Trace provides a kernel API that can be used for the setup
+and control of tracing channels. User of trace must provide a data layer
+responsible for formatting and writing data into the trace channels.
+
+A layered approach to tracing
+=============================
+A complete kernel tracing application consists of a data provider and
+a data consumer. Both provider and consumer contain three layers; each
+layer works in tandem with the corresponding layer in the opposite side.
+The layers are represented in the following diagram.
+
+Provider Data layer
+ Formats raw trace data and provides data-related service.
+ For example, adding timestamps used by consumer to sort data.
+
+Provider Control layer
+ Provided by the trace interface, this layer creates trace channels
+ and informs the data layer and consumer of the current state
+ of the trace channels.
+
+Provider Buffering layer
+ Provided by relay. This layer buffers data in the
+ kernel for consumption by the consumer's buffer
+ layer.
+
+Provider (in-kernel facility)
+-----------------------------------------------------------------------------
+Consumer (user application)
+
+
+Consumer Buffer layer
+ Reads/consumes data from the provider's data buffers.
+
+Consumer Control layer
+ Communicates to the provider's control layer to control the state
+ of the trace channels.
+
+Consumer Data layer
+ Sorts and formats data as provided by the provider's data layer.
+
+The provider is coded as a kernel facility. The consumer is coded as
+a user application.
+
+
+Trace - Features
+================
+Trace exploits services and features provided by relay. These features
+are:
+- The creation and destruction of relay channels.
+- Buffer management. Overwrite or non-overwrite modes can be selected
+ as well as global or per-CPU buffering.
+
+Overwrite mode can be called "flight recorder mode". Flight recorder
+mode is selected by setting the TRACE_FLIGHT_CHANNEL flag when
+creating trace channels. In flight mode when a tracing buffer is
+full, the oldest records in the buffer will be discarded to make room
+as new records arrive. In the default non-overwrite mode, new records
+may be written only if the buffer has room. In either case, to
+prevent data loss, a user space reader must keep the buffers
+drained. Trace provides a means to detect the number of records that
+have been dropped due to a buffer-full condition (non-overwrite mode
+only).
+
+When per-CPU buffers are used, relay creates one debugfs file for each
+running CPU. The user-space consumer of the data is responsible for
+reading the per-CPU buffers and collating the records presumably using
+a time stamp or sequence number included in the trace records. The
+use of global buffers eliminates this extra work of sequencing
+records; however the provider's data layer must hold a lock when
+writing records. The lock prevents writers running on different CPUs
+from overwriting each other's data. However, buffering may be slower
+because writes to the buffer are serialized. Global buffering is
+selected by setting the TRACE_GLOBAL_CHANNEL flag when creating trace
+channels.
+
+Trace User Interface
+===================
+When a trace channel is created and started, the following
+directories and files are created in the root of the mounted debugfs.
+
+/debug (root of the debugfs)
+ /<trace-root-dir>
+ /<trace-name>
+ trace0 ... traceN Per-CPU trace data, one
+ file per CPU.
+
+ state Start or stop tracing by
+ by writing the strings
+ "start" or "stop" to this
+ file. Read the file to get the
+ current state.
+
+ dropped The number of records dropped
+ due to a full-buffer condition,
+ for non-TRACE_FLIGHT_CHANNELs
+ only.
+
+ rewind Trigger a rewind by writing
+ to this file. i.e. start
+ next read at the beginning
+ again. Only available for
+ TRACE_FLIGHT_CHANNELS.
+
+
+ nr_sub Number of sub-buffers
+ in the channel.
+
+ sub_size Size of sub-buffers in
+ the channnel.
+
+Trace data is gathered from the trace[0...N] files using one of the
+available interfaces provided by relay.
+
+When using the READ(2) interface, as data is read it is marked as
+consumed by the relay subsystem. Therefore, subsequent reads will
+only return unconsumed data.
+
+Trace Kernel API
+===============
+An overview of the trace Kernel API is now given. More details of the
+API can be found in linux/trace.h.
+
+The steps a kernel data provider takes to utilize the trace interface are:
+1) Set up a trace channel - trace_setup()
+2) Start the trace channel - trace_start()
+3) Write one or more trace records into the channel (using the relay API).
+
+ Important: When writing a trace record the provider must insure that
+ preemption is disabled and that trace state is set to "running". a
+ typical function used to write records into a trace channel should
+ follow the following semantics:
+
+ rcu_read_lock(); // disables preemption
+ if (trace_running(trace)){
+ relay_write(....); // use any available relay data
+ function
+ }
+ rcu_read_unlock(); // enables preemption
+
+4) Stop and start tracing as desired - trace_start()/trace_stop()
+5) Destroy the trace channel and underlying relay channel -
+trace_cleanup().
+
+Trace Example
+===========
+
+This small sample module creates a trace channel. It places a kprobe
+on the function do_fork(). The value of current->pid is writen to
+the trace channel.
+
+You can build the kernel module fork_trace.ko using the following
+Makefile:
+------------------------------------CUT-------------------------------------
+obj-m := fork_trace.o
+KDIR := /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+default:
+ $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
+clean:
+ rm -f *.mod.c *.ko *.o
+----------------------------------CUT--------------------------------------
+
+/* fork_trace.c - An example of using trace in a kprobes module */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kprobes.h>
+#include <linux/trace.h>
+
+#define USE_GLOBAL_BUFFERS 1
+#define USE_FLIGHT 1
+
+#define PROBE_POINT "do_fork"
+
+static struct kprobe kp;
+struct trace_info *kprobes_trace;
+
+#ifdef USE_GLOBAL_BUFFERS
+static DEFINE_SPINLOCK(trace_lock);
+#endif
+
+#define TRACE_PRINTF_TMPBUF_SIZE (1024)
+static char trace_tmpbuf[NR_CPUS][TRACE_PRINTF_TMPBUF_SIZE];
+
+void trace_printf(struct trace_info *trace, const char *format, ...)
+{
+ va_list args;
+ void *buf;
+ char *record;
+ int len=0;
+
+ if (!trace)
+ return;
+
+ buf = trace_tmpbuf[smp_processor_id()];
+
+#ifdef USE_GLOBAL_BUFFERS
+ spin_lock(&trace_lock);
+#endif
+
+ rcu_read_lock();
+ if (trace_running(trace)){
+ va_start(args, format);
+ len += vsnprintf(buf+len, TRACE_PRINTF_TMPBUF_SIZE,
+ format, args);
+ va_end(args);
+
+ if ((record = relay_reserve(trace->rchan, len)))
+ memcpy(record, buf, len);
+ }
+ rcu_read_unlock();
+
+#ifdef USE_GLOBAL_BUFFERS
+ spin_unlock(&trace_lock);
+#endif
+}
+
+
+int handler_pre(struct kprobe *p, struct pt_regs *regs)
+{
+ trace_printf(kprobes_trace,"%d\n", current->pid);
+ return 0;
+}
+
+
+int init_module(void)
+{
+ u32 flags=0;
+
+#ifdef USE_GLOBAL_BUFFERS
+ flags |= TRACE_GLOBAL_CHANNEL;
+#endif
+
+#ifdef USE_FLIGHT
+ flags |= TRACE_FLIGHT_CHANNEL;
+#endif
+
+ /* setup the trace */
+ kprobes_trace = trace_setup("trace_example",PROBE_POINT,
+ 1024,8,flags);
+ if (!kprobes_trace)
+ return -1;
+
+ trace_start(kprobes_trace);
+
+ /* setup the kprobe */
+ kp.pre_handler = handler_pre;
+ kp.post_handler = NULL;
+ kp.fault_handler = NULL;
+ kp.symbol_name = PROBE_POINT;
+ if (register_kprobe(&kp) < 0) {
+ printk("fork_trace: register_kprobe failed\n");
+ return -1;
+ }
+ return 0;
+}
+
+void cleanup_module(void)
+{
+ unregister_kprobe(&kp);
+ trace_stop(kprobes_trace);
+ trace_cleanup(kprobes_trace);
+}
+
+MODULE_LICENSE("GPL");
+-----------------------------CUT-----------------------------------
+How to run the example:
+$ mount -t debugfs /debug
+$ make
+$ insmod fork_trace.ko
+
+To view the data produced by the module:
+$ cat /debug/trace_example/do_fork/trace0
+
+Remove the module.
+$ rmmod fork_trace
+
+The function trace_cleanup() is called when the module
+is removed. This will cause the TRACE channel to be destroyed and the
+corresponding files to disappear from the debug file system.
+
+Credits
+=======
+Trace is adapted from blktrace authored by Jens Axboe (axboe@suse.de).
+
+Major contributions were made by:
+Tom Zanussi <zanussi@us.ibm.com>
+Martin Hunt <hunt@redhat.com>
+David Wilder <dwilder@us.ibm.com>
diff --git a/include/linux/trace.h b/include/linux/trace.h
new file mode 100644
index 0000000..6dff7d0
--- /dev/null
+++ b/include/linux/trace.h
@@ -0,0 +1,99 @@
+/*
+ * TRACE defines and function prototypes
+ *
+ * Copyright (C) 2006 IBM Inc.
+ *
+ * Tom Zanussi <zanussi@us.ibm.com>
+ * Martin Hunt <hunt@redhat.com>
+ * David Wilder <dwilder@us.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+#ifndef _LINUX_TRACE_H
+#define _LINUX_TRACE_H
+
+#include <linux/relay.h>
+
+/*
+ * TRACE channel flags
+ */
+#define TRACE_GLOBAL_CHANNEL 0x01
+#define TRACE_FLIGHT_CHANNEL 0x02
+#define TRACE_DISABLE_STATE 0x04
+
+enum trace_state {
+ TRACE_SETUP,
+ TRACE_RUNNING,
+ TRACE_STOPPED,
+};
+
+#define TRACE_ROOT_NAME_SIZE 64 /* Max root dir identifier */
+#define TRACE_NAME_SIZE 64 /* Max trace identifier */
+
+/*
+ * Global root user information
+ */
+struct trace_root {
+ struct list_head list;
+ char name[TRACE_ROOT_NAME_SIZE];
+ struct dentry *root;
+ unsigned int users;
+};
+
+/*
+ * Client information
+ */
+struct trace_info {
+ struct mutex state_mutex;
+ enum trace_state state;
+ struct dentry *state_file;
+ struct rchan *rchan;
+ struct dentry *dir;
+ struct dentry *dropped_file;
+ struct dentry *reset_consumed_file;
+ struct dentry *nr_sub_file;
+ struct dentry *sub_size_file;
+ atomic_t dropped;
+ struct trace_root *root;
+ void *private_data;
+ unsigned int flags;
+ unsigned int buf_size;
+ unsigned int buf_nr;
+};
+
+#ifdef CONFIG_TRACE
+static inline int trace_running(struct trace_info *trace)
+{
+ return trace->state == TRACE_RUNNING;
+}
+struct trace_info *trace_setup(const char *root, const char *name,
+ u32 buf_size, u32 buf_nr, u32 flags);
+int trace_start(struct trace_info *trace);
+int trace_stop(struct trace_info *trace);
+void trace_cleanup(struct trace_info *trace);
+#else
+static inline struct trace_info *trace_setup(const char *root,
+ const char *name,u32 buf_size,
+ u32 buf_nr, u32 flags)
+{
+ return NULL;
+}
+static inline int trace_start(struct trace_info *trace) { return -EINVAL; }
+static inline int trace_stop(struct trace_info *trace) { return -EINVAL; }
+static inline int trace_running(struct trace_info *trace) { return 0; }
+static inline void trace_cleanup(struct trace_info *trace) {}
+#endif
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index ba3d104..6d9fffa 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -141,4 +141,14 @@ config HAS_DMA
config CHECK_SIGNATURE
bool
+config TRACE
+ bool "Trace setup and control"
+ select RELAY
+ select DEBUG_FS
+ help
+ This option provides support for the setup, teardown and control
+ of tracing channels from kernel code. It also provides trace
+ information and control to userspace via a set of debugfs control
+ files. If unsure, say N.
+
endmenu
diff --git a/lib/Makefile b/lib/Makefile
index 1e1e8c2..857d7af 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -69,6 +69,8 @@ lib-$(CONFIG_GENERIC_BUG) += bug.o
obj-$(CONFIG_PROFILE_LIKELY) += likely_prof.o
+obj-$(CONFIG_TRACE) += trace.o
+
hostprogs-y := gen_crc32table
clean-files := crc32table.h
diff --git a/lib/trace.c b/lib/trace.c
new file mode 100644
index 0000000..17c9053
--- /dev/null
+++ b/lib/trace.c
@@ -0,0 +1,575 @@
+/*
+ * Based on blktrace code, Copyright (C) 2006 Jens Axboe <axboe@suse.de>
+ * Moved to utt.c by Tom Zanussi <zanussi@us.ibm.com>, 2006
+ * Additional contributions by:
+ * Martin Hunt <hunt@redhat.com>, 2007
+ * David Wilder <dwilder@us.ibm.com>, 2007
+ * Renamed to trace <dwilder.ibm.com>, 2007
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/debugfs.h>
+#include <linux/trace.h>
+
+static LIST_HEAD(trace_roots);
+static DEFINE_MUTEX(trace_mutex);
+
+static int state_open(struct inode *inode, struct file *filp)
+{
+ filp->private_data = inode->i_private;
+
+ return 0;
+}
+
+static ssize_t state_read(struct file *filp, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct trace_info *trace = filp->private_data;
+ char *buf = "trace not started\n";
+
+ if (trace->state == TRACE_STOPPED)
+ buf = "stopped\n";
+ else if (trace->state == TRACE_RUNNING)
+ buf = "running\n";
+ return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
+}
+
+
+static ssize_t state_write(struct file *filp, const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct trace_info *trace = filp->private_data;
+ char buf[16] = { '\0' };
+ int ret;
+
+ if (trace->flags & TRACE_DISABLE_STATE)
+ return -EINVAL;
+
+ if (count > sizeof(buf) - 1)
+ return -EINVAL;
+
+ if (copy_from_user(buf, buffer, count))
+ return -EFAULT;
+
+ buf[count] = '\0';
+
+ if (strncmp(buf, "start", strlen("start")) == 0 ) {
+ ret = trace_start(trace);
+ if (ret)
+ return ret;
+ } else if (strncmp(buffer, "stop", strlen("stop")) == 0)
+ trace_stop(trace);
+ else
+ return -EINVAL;
+
+ return count;
+}
+
+
+static struct file_operations state_fops = {
+ .owner = THIS_MODULE,
+ .open = state_open,
+ .read = state_read,
+ .write = state_write,
+};
+
+
+static void remove_root(struct trace_info *trace)
+{
+ if (trace->root->root && simple_empty(trace->root->root)) {
+ debugfs_remove(trace->root->root);
+ list_del(&trace->root->list);
+ kfree(trace->root);
+ trace->root = NULL;
+ }
+}
+
+
+static void remove_tree(struct trace_info *trace)
+{
+ mutex_lock(&trace_mutex);
+
+ debugfs_remove(trace->dir);
+
+ if (--trace->root->users == 0)
+ remove_root(trace);
+
+ mutex_unlock(&trace_mutex);
+}
+
+
+/*
+ * Creates the trace_root if it's not found.
+ */
+static struct trace_root *lookup_root(const char *root)
+{
+ struct list_head *pos;
+ struct trace_root *r;
+
+ list_for_each(pos, &trace_roots) {
+ r = list_entry(pos, struct trace_root, list);
+ if (!strcmp(r->name, root))
+ return r;
+ }
+
+ r = kzalloc(sizeof(struct trace_root), GFP_KERNEL);
+ if (!r)
+ return NULL;
+
+ strlcpy(r->name, root, sizeof(r->name));
+
+ r->root = debugfs_create_dir(root, NULL);
+ if (r->root)
+ list_add(&r->list, &trace_roots);
+
+ return r;
+}
+
+
+static struct dentry *create_tree(struct trace_info *trace, const char *root,
+ const char *name)
+{
+ struct dentry *dir = NULL;
+
+ if (root == NULL || name == NULL)
+ return NULL;
+
+ mutex_lock(&trace_mutex);
+
+ trace->root = lookup_root(root);
+ if (!trace->root)
+ goto err;
+
+ dir = debugfs_create_dir(name, trace->root->root);
+ if (dir)
+ trace->root->users++;
+ else
+ remove_root(trace);
+
+err:
+ mutex_unlock(&trace_mutex);
+
+ return dir;
+}
+
+
+static int dropped_open(struct inode *inode, struct file *filp)
+{
+ filp->private_data = inode->i_private;
+
+ return 0;
+}
+
+
+static ssize_t dropped_read(struct file *filp, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct trace_info *trace = filp->private_data;
+ char buf[16];
+
+ snprintf(buf, sizeof(buf), "%u\n", atomic_read(&trace->dropped));
+
+ return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
+}
+
+
+static struct file_operations dropped_fops = {
+ .owner = THIS_MODULE,
+ .open = dropped_open,
+ .read = dropped_read,
+};
+
+static int reset_consumed_open(struct inode *inode, struct file *filp)
+{
+ filp->private_data = inode->i_private;
+
+ return 0;
+}
+
+static ssize_t reset_consumed_write(struct file *filp,
+ const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ int ret=count;
+ struct trace_info *trace = filp->private_data;
+
+ mutex_lock(&trace->state_mutex);
+ switch (trace->state){
+ case TRACE_RUNNING:
+ trace->state = TRACE_STOPPED;
+ synchronize_rcu();
+ relay_flush(trace->rchan);
+ relay_reset_consumed(trace->rchan);
+ trace->state = TRACE_RUNNING;
+ break;
+ case TRACE_STOPPED:
+ relay_reset_consumed(trace->rchan);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+ mutex_unlock(&trace->state_mutex);
+ return ret;
+}
+
+struct file_operations reset_consumed_fops = {
+ .owner = THIS_MODULE,
+ .open = reset_consumed_open,
+ .write = reset_consumed_write
+};
+
+static int sub_size_open(struct inode *inode, struct file *filp)
+{
+ filp->private_data = inode->i_private;
+
+ return 0;
+}
+
+static ssize_t sub_size_read(struct file *filp, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct trace_info *trace = filp->private_data;
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), "%u\n",
+ (unsigned int)trace->rchan->subbuf_size);
+
+ return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
+}
+
+struct file_operations sub_size_fops = {
+ .owner = THIS_MODULE,
+ .open = sub_size_open,
+ .read = sub_size_read,
+};
+
+static int nr_sub_open(struct inode *inode, struct file *filp)
+{
+ filp->private_data = inode->i_private;
+ return 0;
+}
+
+static ssize_t nr_sub_read(struct file *filp, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct trace_info *trace = filp->private_data;
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), "%u\n",
+ (unsigned int)trace->rchan->n_subbufs);
+
+ return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
+}
+
+struct file_operations nr_sub_fops = {
+ .owner = THIS_MODULE,
+ .open = nr_sub_open,
+ .read = nr_sub_read,
+};
+
+/*
+ * Keep track of how many times we encountered a full subbuffer, to aid
+ * the user space app in telling how many lost events there were.
+ */
+static int subbuf_start_callback(struct rchan_buf *buf, void *subbuf,
+ void *prev_subbuf, size_t prev_padding)
+{
+ struct trace_info *trace = buf->chan->private_data;
+
+ if (trace->flags & TRACE_FLIGHT_CHANNEL)
+ return 1;
+
+ if (!relay_buf_full(buf))
+ return 1;
+
+ atomic_inc(&trace->dropped);
+
+ return 0;
+}
+
+
+static int remove_buf_file_callback(struct dentry *dentry)
+{
+ debugfs_remove(dentry);
+
+ return 0;
+}
+
+
+static struct dentry *create_buf_file_callback(const char *filename,
+ struct dentry *parent, int mode,
+ struct rchan_buf *buf,
+ int *is_global)
+{
+ return debugfs_create_file(filename, mode, parent, buf,
+ &relay_file_operations);
+}
+
+
+static struct dentry *create_global_buf_file_callback(const char *filename,
+ struct dentry *parent,
+ int mode,
+ struct rchan_buf *buf,
+ int *is_global)
+{
+ *is_global = 1;
+
+ return debugfs_create_file(filename, mode, parent, buf,
+ &relay_file_operations);
+}
+
+
+static struct rchan_callbacks relay_callbacks = {
+ .subbuf_start = subbuf_start_callback,
+ .create_buf_file = create_buf_file_callback,
+ .remove_buf_file = remove_buf_file_callback,
+};
+static struct rchan_callbacks relay_callbacks_global = {
+ .subbuf_start = subbuf_start_callback,
+ .create_buf_file = create_global_buf_file_callback,
+ .remove_buf_file = remove_buf_file_callback,
+};
+
+
+static void remove_controls(struct trace_info *trace)
+{
+ if (trace->state_file)
+ debugfs_remove(trace->state_file);
+ if (trace->dropped_file)
+ debugfs_remove(trace->dropped_file);
+ if (trace->reset_consumed_file)
+ debugfs_remove(trace->reset_consumed_file);
+ if (trace->nr_sub_file)
+ debugfs_remove(trace->nr_sub_file);
+ if (trace->sub_size_file)
+ debugfs_remove(trace->sub_size_file);
+ if (trace->dir)
+ remove_tree(trace);
+}
+
+/*
+ * Setup controls for tracing.
+ */
+static struct trace_info *setup_controls(const char *root,
+ const char *name, u32 flags)
+{
+ struct trace_info *trace = NULL;
+
+ trace = kzalloc(sizeof(*trace), GFP_KERNEL);
+ if (!trace)
+ goto err;
+
+ trace->dir = create_tree(trace, root, name);
+ if (!trace->dir)
+ goto err;
+
+ trace->state_file = debugfs_create_file("state", 0444, trace->dir,
+ trace, &state_fops);
+ if (!trace->state_file)
+ goto err;
+
+ if (!flags & TRACE_FLIGHT_CHANNEL) {
+ trace->dropped_file = debugfs_create_file("dropped", 0444,
+ trace->dir, trace,
+ &dropped_fops);
+ if (!trace->dropped_file)
+ goto err;
+ }
+
+ if (flags & TRACE_FLIGHT_CHANNEL) {
+ trace->reset_consumed_file = debugfs_create_file("rewind",
+ 0444,
+ trace->dir,
+ trace,
+ &reset_consumed_fops);
+ if (!trace->reset_consumed_file)
+ goto err;
+ }
+
+ trace->nr_sub_file = debugfs_create_file("nr_sub", 0444,
+ trace->dir, trace,
+ &nr_sub_fops);
+ if (!trace->nr_sub_file)
+ goto err;
+
+ trace->sub_size_file = debugfs_create_file("sub_size", 0444,
+ trace->dir, trace,
+ &sub_size_fops);
+ if (!trace->sub_size_file)
+ goto err;
+
+ return trace;
+err:
+ if (trace) {
+ remove_controls(trace);
+ kfree(trace);
+ }
+
+ return NULL;
+}
+
+
+static int trace_setup_channel(struct trace_info *trace, u32 buf_size,
+ u32 buf_nr, u32 flags)
+{
+ if (!buf_size || !buf_nr)
+ return -EINVAL;
+
+ if (flags & TRACE_GLOBAL_CHANNEL)
+ trace->rchan = relay_open("trace", trace->dir, buf_size,
+ buf_nr, &relay_callbacks_global,
+ trace);
+ else
+ trace->rchan = relay_open("trace", trace->dir, buf_size,
+ buf_nr, &relay_callbacks, trace);
+
+ if (!trace->rchan)
+ return -ENOMEM;
+
+ trace->flags = flags;
+ trace->state = TRACE_SETUP;
+
+ return 0;
+}
+
+
+/**
+ * trace_setup: create a new trace trace handle
+ *
+ * @root: The root directory name in the root of the debugfs
+ * to place trace directories. Created as needed.
+ * @name: Trace directory name, created in @root
+ * @buf_size: size of the relay sub-buffers
+ * @buf_nr: number of relay sub-buffers
+ * @flags: Option selection (see GTSC channel flags definitions)
+ * default values when flags=0 are: use per-CPU buffering,
+ * use non-overwrite mode. See Documentation/trace.txt for details.
+ *
+ * returns a trace_info handle or NULL, if setup failed.
+ */
+struct trace_info *trace_setup(const char *root, const char *name,
+ u32 buf_size, u32 buf_nr, u32 flags)
+{
+ struct trace_info *trace = NULL;
+
+ trace = setup_controls(root, name, flags);
+ if (!trace)
+ return NULL;
+
+ trace->buf_size = buf_size;
+ trace->buf_nr = buf_nr;
+ trace->flags = flags;
+ mutex_init(&trace->state_mutex);
+ trace->state = TRACE_SETUP;
+
+ return trace;
+}
+EXPORT_SYMBOL_GPL(trace_setup);
+
+
+/**
+ * trace_start: start tracing
+ *
+ * @trace: trace handle to start.
+ *
+ * returns 0 if successful.
+ */
+int trace_start(struct trace_info *trace)
+{
+ /*
+ * For starting a trace, we can transition from a setup or stopped
+ * trace.
+ */
+ if (trace->state == TRACE_RUNNING)
+ return -EINVAL;
+
+ mutex_lock(&trace->state_mutex);
+ if (trace->state == TRACE_SETUP) {
+ int ret;
+
+ ret = trace_setup_channel(trace, trace->buf_size,
+ trace->buf_nr, trace->flags);
+ if (ret){
+ mutex_unlock(&trace->state_mutex);
+ return ret;
+ }
+ }
+
+ trace->state = TRACE_RUNNING;
+ mutex_unlock(&trace->state_mutex);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(trace_start);
+
+/**
+ * trace_stop: stop tracing
+ *
+ * @trace: trace handle to stop.
+ *
+ */
+int trace_stop(struct trace_info *trace)
+{
+ int ret = -EINVAL;
+
+ /*
+ * For stopping a trace, the state must be running
+ */
+ mutex_lock(&trace->state_mutex);
+ if (trace->state == TRACE_RUNNING) {
+ trace->state = TRACE_STOPPED;
+ /*
+ * wait for all cpus to see the change in
+ * state before continuing
+ */
+ synchronize_sched();
+ relay_flush(trace->rchan);
+ ret = 0;
+ }
+ mutex_unlock(&trace->state_mutex);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(trace_stop);
+
+/**
+ * trace_cleanup_channel: destroys the trace channel only
+ *
+ * @trace: trace handle to cleanup
+ */
+static void trace_cleanup_channel(struct trace_info *trace)
+{
+ trace_stop(trace);
+ if (trace->rchan)
+ relay_close(trace->rchan);
+ trace->rchan = NULL;
+}
+
+/**
+ * trace_cleanup: destroys the trace channel, control files and dir
+ *
+ * @trace: trace handle to cleanup
+ */
+void trace_cleanup(struct trace_info *trace)
+{
+ trace_cleanup_channel(trace);
+ remove_controls(trace);
+ kfree(trace);
+}
+EXPORT_SYMBOL_GPL(trace_cleanup);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Trace code and documentation
2007-09-13 23:43 [PATCH 1/2] Trace code and documentation David Wilder
@ 2007-09-14 4:12 ` Randy Dunlap
2007-09-14 17:50 ` Sam Ravnborg
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Randy Dunlap @ 2007-09-14 4:12 UTC (permalink / raw)
To: David Wilder; +Cc: linux-kernel, SystemTAP, akpm
On Thu, 13 Sep 2007 16:43:16 -0700 David Wilder wrote:
> [it would be easier to review and make sense of the comments
> if the patch were inline instead of attached]
Tom Zanussi <zanussi@us.ibm.com>
Martin Hunt <hunt@redhat.com>
David Wilder <dwilder@us.ibm.com>
Above needs to use Signed-off-by: if you want this merged.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
trace.txt:
+When using the READ(2) interface,
s/READ/read/
+ preemption is disabled and that trace state is set to "running". a
s/. a/. A/
+on the function do_fork(). The value of current->pid is writen to
s/writen/written/
+You can build the kernel module fork_trace.ko using the following
+Makefile:
There was reportedly a discussion about sample source code in the
Documentation/ directory at the kernel summit. Some people want to
move it to the util-linux package. If that's not done, I strongly
prefer that Makefiles and source files be put into their own
sub-directory, not "hidden" inside txt files, so you would end up
with something line Documentation/trace/, with trace.txt, and also
Documentation/trace/src, with Makefile and fork_trace.c & any other
source files.
+Trace is adapted from blktrace authored by Jens Axboe (axboe@suse.de).
MAINTAINERS file says <axboe@kernel.dk>. He's also
<jens.axboe@oracle.com>.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/trace.c:
+ * Based on blktrace code, Copyright (C) 2006 Jens Axboe <axboe@suse.de>
Ditto.
+
+
Use just one blank line between functions and/or structs.
Please check the patch with scripts/checkpatch.pl and then evaluate
its warnings. Sometimes it makes sense to ignore some of them.
+/**
+ * trace_setup: create a new trace trace handle
+ *
+ * @root: The root directory name in the root of the debugfs
+ * to place trace directories. Created as needed.
Thanks for using kernel-doc; however, don't put a blank line between
the function name line and the parameters. Also, the function
name line should have a "-" separating the function name and the
short description. (multiple places in trace.c)
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Trace code and documentation
2007-09-13 23:43 [PATCH 1/2] Trace code and documentation David Wilder
2007-09-14 4:12 ` Randy Dunlap
@ 2007-09-14 17:50 ` Sam Ravnborg
2007-09-15 4:49 ` David Wilder
2007-09-15 1:08 ` Andrew Morton
2007-09-15 16:01 ` Mathieu Desnoyers
3 siblings, 1 reply; 9+ messages in thread
From: Sam Ravnborg @ 2007-09-14 17:50 UTC (permalink / raw)
To: David Wilder; +Cc: linux-kernel, SystemTAP, akpm
Hi David.
A random comment to the code.
Several of the struct file_operations are not declared static as
they should be.
Btw. it looks good from a coding style point-of-view.
About the name what about ktrace??
Sam
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Trace code and documentation
2007-09-13 23:43 [PATCH 1/2] Trace code and documentation David Wilder
2007-09-14 4:12 ` Randy Dunlap
2007-09-14 17:50 ` Sam Ravnborg
@ 2007-09-15 1:08 ` Andrew Morton
2007-09-15 1:40 ` Randy Dunlap
2007-09-15 4:47 ` David Wilder
2007-09-15 16:01 ` Mathieu Desnoyers
3 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2007-09-15 1:08 UTC (permalink / raw)
To: David Wilder; +Cc: linux-kernel, SystemTAP
> Trace - Provides tracing primitives
>
> ...
>
> +config TRACE
> + bool "Trace setup and control"
> + select RELAY
> + select DEBUG_FS
> + help
> + This option provides support for the setup, teardown and control
> + of tracing channels from kernel code. It also provides trace
> + information and control to userspace via a set of debugfs control
> + files. If unsure, say N.
> +
select is evil - you really want to avoid using it.
The problem is where you select a symbol whose dependencies aren't met.
Kconfig resolves this incompatibility by just not selecting the thing you
wanted, iirc. So your CONFIG_SYSFS=n, CONFIG_TRACE=y kernel won't build.
> +/*
> + * Based on blktrace code, Copyright (C) 2006 Jens Axboe <axboe@suse.de>
So can we migrate blktrace to using this?
> +static ssize_t state_write(struct file *filp, const char __user *buffer,
> + size_t count, loff_t *ppos)
> +{
> + struct trace_info *trace = filp->private_data;
> + char buf[16] = { '\0' };
this initialisation isn't needed and will waste cycles.
> + int ret;
> +
> + if (trace->flags & TRACE_DISABLE_STATE)
> + return -EINVAL;
> +
> + if (count > sizeof(buf) - 1)
> + return -EINVAL;
> +
> + if (copy_from_user(buf, buffer, count))
> + return -EFAULT;
> +
> + buf[count] = '\0';
> +
> + if (strncmp(buf, "start", strlen("start")) == 0 ) {
> + ret = trace_start(trace);
> + if (ret)
> + return ret;
> + } else if (strncmp(buffer, "stop", strlen("stop")) == 0)
> + trace_stop(trace);
> + else
> + return -EINVAL;
What's the above code doing? Trying to cope with trailing chars after
"start" or "stop"? Is that actually needed? It's the \n, I assume?
> + return count;
> +}
> +
> +
> +static struct file_operations state_fops = {
> + .owner = THIS_MODULE,
> + .open = state_open,
> + .read = state_read,
> + .write = state_write,
> +};
> +
> +
> +static void remove_root(struct trace_info *trace)
> +{
> + if (trace->root->root && simple_empty(trace->root->root)) {
> + debugfs_remove(trace->root->root);
> + list_del(&trace->root->list);
> + kfree(trace->root);
> + trace->root = NULL;
> + }
> +}
> +
> +
> +static void remove_tree(struct trace_info *trace)
> +{
> + mutex_lock(&trace_mutex);
> +
> + debugfs_remove(trace->dir);
> +
> + if (--trace->root->users == 0)
> + remove_root(trace);
> +
> + mutex_unlock(&trace_mutex);
> +}
We usually only put a single blank line between functions. Two is just a
waste of screen space.
> +
> +
> +/*
> + * Creates the trace_root if it's not found.
> + */
>
> ...
>
> +static ssize_t sub_size_read(struct file *filp, char __user *buffer,
> + size_t count, loff_t *ppos)
> +{
> + struct trace_info *trace = filp->private_data;
> + char buf[32];
> +
> + snprintf(buf, sizeof(buf), "%u\n",
> + (unsigned int)trace->rchan->subbuf_size);
Use %tu to print a size_t, rather than the typecast.
> + return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
> +}
> +
>
> ...
>
> +static ssize_t nr_sub_read(struct file *filp, char __user *buffer,
> + size_t count, loff_t *ppos)
> +{
> + struct trace_info *trace = filp->private_data;
> + char buf[32];
> +
> + snprintf(buf, sizeof(buf), "%u\n",
> + (unsigned int)trace->rchan->n_subbufs);
Ditto. (It's unobvious why n_subbufs is a size_t)
> + return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
> +}
> +
>
> ...
>
> +static void remove_controls(struct trace_info *trace)
> +{
> + if (trace->state_file)
> + debugfs_remove(trace->state_file);
> + if (trace->dropped_file)
> + debugfs_remove(trace->dropped_file);
> + if (trace->reset_consumed_file)
> + debugfs_remove(trace->reset_consumed_file);
> + if (trace->nr_sub_file)
> + debugfs_remove(trace->nr_sub_file);
> + if (trace->sub_size_file)
> + debugfs_remove(trace->sub_size_file);
debugfs_remove(NULL) is legal: all the above tests can be removed.
> + if (trace->dir)
> + remove_tree(trace);
> +}
> +
>
> ...
>
> + * trace_setup: create a new trace trace handle
> + *
> + * @root: The root directory name in the root of the debugfs
> + * to place trace directories. Created as needed.
> + * @name: Trace directory name, created in @root
> + * @buf_size: size of the relay sub-buffers
> + * @buf_nr: number of relay sub-buffers
> + * @flags: Option selection (see GTSC channel flags definitions)
> + * default values when flags=0 are: use per-CPU buffering,
> + * use non-overwrite mode. See Documentation/trace.txt for details.
> + *
> + * returns a trace_info handle or NULL, if setup failed.
> + */
> +struct trace_info *trace_setup(const char *root, const char *name,
> + u32 buf_size, u32 buf_nr, u32 flags)
> +{
> + struct trace_info *trace = NULL;
> +
> + trace = setup_controls(root, name, flags);
> + if (!trace)
> + return NULL;
> +
> + trace->buf_size = buf_size;
> + trace->buf_nr = buf_nr;
> + trace->flags = flags;
> + mutex_init(&trace->state_mutex);
> + trace->state = TRACE_SETUP;
> +
> + return trace;
> +}
> +EXPORT_SYMBOL_GPL(trace_setup);
It's better for a pointer-returning function to return an ERR_PTR on error,
rather than NULL. That way the caller doesn't have to make guesses about
why the callee failed when propagating back an error. (See how your
init_module gives up and returns -1?)
> ...
>
> +
> +/**
> + * trace_cleanup_channel: destroys the trace channel only
> + *
> + * @trace: trace handle to cleanup
> + */
> +static void trace_cleanup_channel(struct trace_info *trace)
> +{
> + trace_stop(trace);
> + if (trace->rchan)
> + relay_close(trace->rchan);
relay_close(NULL) is legal. Please check the whole patch for this.
> + trace->rchan = NULL;
> +}
> +
> +/**
> + * trace_cleanup: destroys the trace channel, control files and dir
> + *
> + * @trace: trace handle to cleanup
> + */
> +void trace_cleanup(struct trace_info *trace)
> +{
> + trace_cleanup_channel(trace);
> + remove_controls(trace);
> + kfree(trace);
> +}
> +EXPORT_SYMBOL_GPL(trace_cleanup);
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Trace code and documentation
2007-09-15 1:08 ` Andrew Morton
@ 2007-09-15 1:40 ` Randy Dunlap
2007-09-15 4:47 ` David Wilder
1 sibling, 0 replies; 9+ messages in thread
From: Randy Dunlap @ 2007-09-15 1:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: David Wilder, linux-kernel, SystemTAP
On Fri, 14 Sep 2007 18:08:40 -0700 Andrew Morton wrote:
> > Trace - Provides tracing primitives
> >
> > ...
> >
> > +config TRACE
> > + bool "Trace setup and control"
> > + select RELAY
> > + select DEBUG_FS
> > + help
> > + This option provides support for the setup, teardown and control
> > + of tracing channels from kernel code. It also provides trace
> > + information and control to userspace via a set of debugfs control
> > + files. If unsure, say N.
> > +
>
> select is evil - you really want to avoid using it.
I checked when I reviewed(?) this patch. There are a few other
places that select also (IIRC, the blktrace code does), but most of
them use depends on.
> The problem is where you select a symbol whose dependencies aren't met.
> Kconfig resolves this incompatibility by just not selecting the thing you
> wanted, iirc. So your CONFIG_SYSFS=n, CONFIG_TRACE=y kernel won't build.
>
> > +/*
> > + * Based on blktrace code, Copyright (C) 2006 Jens Axboe <axboe@suse.de>
>
> So can we migrate blktrace to using this?
> > +static ssize_t sub_size_read(struct file *filp, char __user *buffer,
> > + size_t count, loff_t *ppos)
> > +{
> > + struct trace_info *trace = filp->private_data;
> > + char buf[32];
> > +
> > + snprintf(buf, sizeof(buf), "%u\n",
> > + (unsigned int)trace->rchan->subbuf_size);
>
> Use %tu to print a size_t, rather than the typecast.
Eh?
Use %zu to print a size_t. Use %tu to print a ptrdiff_t.
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Trace code and documentation
2007-09-15 1:08 ` Andrew Morton
2007-09-15 1:40 ` Randy Dunlap
@ 2007-09-15 4:47 ` David Wilder
1 sibling, 0 replies; 9+ messages in thread
From: David Wilder @ 2007-09-15 4:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, SystemTAP
Andrew Morton wrote:
>> +/*
>> + * Based on blktrace code, Copyright (C) 2006 Jens Axboe <axboe@suse.de>
>>
>
> So can we migrate blktrace to using this?
>
Yes, a blktrace patch is comming.
>> + int ret;
>> +
>> + if (trace->flags & TRACE_DISABLE_STATE)
>> + return -EINVAL;
>> +
>> + if (count > sizeof(buf) - 1)
>> + return -EINVAL;
>> +
>> + if (copy_from_user(buf, buffer, count))
>> + return -EFAULT;
>> +
>> + buf[count] = '\0';
>> +
>> + if (strncmp(buf, "start", strlen("start")) == 0 ) {
>> + ret = trace_start(trace);
>> + if (ret)
>> + return ret;
>> + } else if (strncmp(buffer, "stop", strlen("stop")) == 0)
>> + trace_stop(trace);
>> + else
>> + return -EINVAL;
>>
>
> What's the above code doing? Trying to cope with trailing chars after
> "start" or "stop"? Is that actually needed? It's the \n, I assume?
>
Yes, the typical usage is "echo start > state" and echo adds a \n.
Thanks for the comments, I will make the changes and resubmit.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Trace code and documentation
2007-09-14 17:50 ` Sam Ravnborg
@ 2007-09-15 4:49 ` David Wilder
2007-09-15 5:18 ` Sam Ravnborg
0 siblings, 1 reply; 9+ messages in thread
From: David Wilder @ 2007-09-15 4:49 UTC (permalink / raw)
To: Sam Ravnborg; +Cc: linux-kernel, SystemTAP, akpm
Sam Ravnborg wrote:
> Hi David.
>
> A random comment to the code.
> Several of the struct file_operations are not declared static as
> they should be.
>
> Btw. it looks good from a coding style point-of-view.
>
> About the name what about ktrace??
>
> Sam
>
>
Thanks for the comment. I sure don't want to change the name a forth
time, can we live with "trace"?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Trace code and documentation
2007-09-15 4:49 ` David Wilder
@ 2007-09-15 5:18 ` Sam Ravnborg
0 siblings, 0 replies; 9+ messages in thread
From: Sam Ravnborg @ 2007-09-15 5:18 UTC (permalink / raw)
To: David Wilder; +Cc: linux-kernel, SystemTAP, akpm
On Fri, Sep 14, 2007 at 09:49:31PM -0700, David Wilder wrote:
> Sam Ravnborg wrote:
> >Hi David.
> >
> >A random comment to the code.
> >Several of the struct file_operations are not declared static as
> >they should be.
> >
> >Btw. it looks good from a coding style point-of-view.
> >
> >About the name what about ktrace??
> >
> > Sam
> >
> >
> Thanks for the comment. I sure don't want to change the name a forth
> time, can we live with "trace"?
I do not care much about the name so no big deal for me.
I was just soo generic..
Sam
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Trace code and documentation
2007-09-13 23:43 [PATCH 1/2] Trace code and documentation David Wilder
` (2 preceding siblings ...)
2007-09-15 1:08 ` Andrew Morton
@ 2007-09-15 16:01 ` Mathieu Desnoyers
3 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2007-09-15 16:01 UTC (permalink / raw)
To: David Wilder; +Cc: linux-kernel, SystemTAP, akpm
Hi David,
Interesting work, but I think we could still enhance it. The interesting
things you bring is the trace control though debugfs files, which is
clear and simple. (I did it on top of netlink in LTTng, but I don't
really care about the mechanism, as long as we have the same
flexibility).
* David Wilder (dwilder@us.ibm.com) wrote:
[...]
>
> +Overwrite mode can be called "flight recorder mode". Flight recorder
> +mode is selected by setting the TRACE_FLIGHT_CHANNEL flag when
> +creating trace channels. In flight mode when a tracing buffer is
> +full, the oldest records in the buffer will be discarded to make room
> +as new records arrive. In the default non-overwrite mode, new records
> +may be written only if the buffer has room. In either case, to
> +prevent data loss, a user space reader must keep the buffers
> +drained. Trace provides a means to detect the number of records that
> +have been dropped due to a buffer-full condition (non-overwrite mode
> +only).
> +
Since, in the end, we can represent the "flight recorder" as a simple flag,
can we imagine setting/unsetting it while the trace is active ?
Also, why is trace creation an in-kernel API ? What about a mkdir in
debugfs/trace ? I guess I see that this is because you need to keep a
pointer to the created trace so you can record your events it in. Have
you thought about keeping a global RCU list of active traces instead ?
One could then iterate on every active trace and record information in
them without having to bother about which specific trace it has created.
However, this should come with the ability to filter in/out events in
the handler on a per-trace basis.
The problem with the approach you propose is that it seems to tie a
specific event source to one trace channel.
It would be good to have some way to separate:
- event sources (markers/kprobes/...)
- active traces.
Each of them would have an event filter.
- within each trace, the ability to create _multiple_ channels, so we
can send the information in high/medium/low event rate channels on a
per-event basis. This is really useful to gather hybrid traces made
from flight recorder channels (high event rate) and non-over channels
(important low rate information required to understand the trace).
- Each trace channel would be either global or per cpu, and would be a
flight recorder channel or "normal", non overwrite, channel.
> +When per-CPU buffers are used, relay creates one debugfs file for each
> +running CPU. The user-space consumer of the data is responsible for
> +reading the per-CPU buffers and collating the records presumably using
> +a time stamp or sequence number included in the trace records. The
> +use of global buffers eliminates this extra work of sequencing
> +records; however the provider's data layer must hold a lock when
> +writing records. The lock prevents writers running on different CPUs
> +from overwriting each other's data. However, buffering may be slower
> +because writes to the buffer are serialized. Global buffering is
> +selected by setting the TRACE_GLOBAL_CHANNEL flag when creating trace
> +channels.
> +
We could allocate the trace buffers upon actions that would be
independant from trace creation. By doing so, we could then do a
echo 1 > path_to_trace/channel/global
Before we activate the trace or allocate the buffers. I would vote for a
echo 1 > path_to_trace/channel/allocate
So we can separate the trace buffer allocation from trace start (because
start operation might have to be done near from the studied events and
we want it to be as lightweight as possible).
So, typical usage could be:
cd /mnt/debugfs/trace
mkdir mytrace
cd mytrace
echo 1 > start
The default could be that we create a trace with a "main" set of per-cpu
channels. i.e.:
mytrace/main
But then, a mkdir within the mytrace directory could add new custom
channels:
in mytrace:
mkdir processes
cd processes
(then set buffer size, nr subbuf, flight vs non flight, global..)
echo 1 > allocate
By default, a trace event filter would accept all events. Events could
be identified by a name (see markers proposed subsystem_event name).
Issuing a :
echo 0 > path_to_trace/filter
would disable all events
echo "event_name" > path_to_trace/filter
would add the event_name to the trace filter
By default, events would be sent into the "main" channel, but
echo "event_name" > path_to_trace/channel/filter
would send the event in the "channel" channel instead.
We could think of integrating the markers macro into this scheme to
describe the events. Instead of doing an explicit trace write in the
breakpoint handler, we could simply put a marker (it could even be a
branch-free marker if you prefer). In LTTng, upon trace start, I iterate
on all the kernel's markers to record, in a "control" channel, all the
marker names, their ids (I assign them a 16 bit id), and their format
strings. It allows me to parse the trace given only the timestamps,
event IDs and event specific data.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-09-15 16:01 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-13 23:43 [PATCH 1/2] Trace code and documentation David Wilder
2007-09-14 4:12 ` Randy Dunlap
2007-09-14 17:50 ` Sam Ravnborg
2007-09-15 4:49 ` David Wilder
2007-09-15 5:18 ` Sam Ravnborg
2007-09-15 1:08 ` Andrew Morton
2007-09-15 1:40 ` Randy Dunlap
2007-09-15 4:47 ` David Wilder
2007-09-15 16:01 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox