* [RFC][PATCH 0/5] tracing/events: stable tracepoints
@ 2010-11-17 0:53 Steven Rostedt
2010-11-17 0:53 ` [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem Steven Rostedt
` (5 more replies)
0 siblings, 6 replies; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 0:53 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers
[ RFC ONLY - Not for inclusion ]
As discussed at Kernel Summit, there was some issues about what to
do with tracepoints.
Basically, anyone, anywhere, any developer, can create a tracepoint
and have it appear in /sys/kernel/debug/tracing/events/...
These events automatically appear in both perf and ftrace as events.
And any tool can tap into them. That's where the problem rises.
What happens when a tool starts to depend on a tracepoint?
Will that tracepoint always be there? Will it ever change?
The problem also extends to the fact that we can't guarantee that
tracepoints will stay as is. There are literally hundreds of
tracepoints, and they are used by developers to have in field
debugging tools. As the kernel changes, so will these tracepoints.
A developer can use these to ask a customer that has run into some
problem to enable a trace and send the developer back the trace
so they can go off and analyze it.
But for tools, this is a different story. They want and depend on
a tracepoint to be stable. If it changes under them, then it makes
tracepoints completely useless for tools.
This patch series is a start and RFC for the creation of
stable tracepoints. I will now call the current tracepoints raw
or in-field-debugging tracepoints or events. What I call stable tracepoints
are those that are to answer questions about the OS and not for
a developer to debug their code.
What I propose is to create a new format and a new filesystem called
eventfs. Like debugfs, when enabled, a directory will be created:
/sys/kernel/events
Which would be the normal place to mount the eventfs filesystem.
The old format for events looked like this:
$ cat /debug/tracing/events/sched/sched_switch/format
name: sched_switch
ID: 57
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int common_lock_depth; offset:8; size:4; signed:1;
field:char prev_comm[TASK_COMM_LEN]; offset:12; size:16; signed:1;
field:pid_t prev_pid; offset:28; size:4; signed:1;
field:int prev_prio; offset:32; size:4; signed:1;
field:long prev_state; offset:40; size:8; signed:1;
field:char next_comm[TASK_COMM_LEN]; offset:48; size:16; signed:1;
field:pid_t next_pid; offset:64; size:4; signed:1;
field:int next_prio; offset:68; size:4; signed:1;
print fmt: "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s ==> next_comm=%s next_pid=%d next_prio=%d", REC->prev_comm, REC->prev_pid, REC->prev_prio, REC->prev_state ? __print_flags(REC->prev_state, "|", { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" }, { 16, "Z" }, { 32, "X" }, { 64, "x" }, { 128, "W" }) : "R", REC->next_comm, REC->next_pid, REC->next_prio
The "common" fields were ftrace (and because perf attached to it, also perf)
specific. Also the size is in bytes, which would limit the ability
to use bit fields. We also don't know about arch specific alignment
that may be needed to write to these fields.
We also have name (redundant), ID (should be agnostic), and print_fmt
(lots of issues).
So the new format looks like this:
[root@bxf ~]# cat /sys/kernel/event/sched_switch/format
array:prev_comm type:char size:8 count:16 align:1 signed:1;
field:prev_pid type:pid_t size:32 align:4 signed:1;
field:prev_state type:char size:8 align:1 signed:1;
array:next_comm type:char size:8 count:16 align:1 signed:1;
field:next_pid type:pid_t size:32 align:4 signed:1;
Some notes:
o The size is in bits.
o We added an align, that is the natural alignment for the arch of that
type.
o We added an "array" type, that specifies the size of an element as
well as a "count", where total size can be align(size) * count.
o We separated the field name from the type.
Not in this series, but for future (after we agree on all this) I would
like to move the raw tracepoints into /debug/events/... and have the
same format as here.
This patch series uses some of the same tricks as the TRACE_EVENT() code.
It has magic macros to do all the redundant code. But it has a bit
of manual work.
Right now, when a STABLE_EVENT() is created, the format appears.
But nothing hooks into it yet. perf, trace, or ftrace could register
a handle that is created, either manually, or it can use the same
magic macro tricks to automate all the stable events. The design has
been made to allow for that too.
The last two patches create two stable tracepoints. sched_switch
and sched_migrate_task (for examples as well as to get the ball rolling).
As you may have already noticed, there is currently no hierarchy with
the stable events. We want to limit the # of stable events, as they
should only be created to help answer general questions about the OS.
All events reside at the top layer of the eventfs filesystem.
(I do not plan on doing this for the raw events though).
Another note is that all stable events need a corresponding raw event.
The raw event does not need to be of the same format as the stable
event, it just needs to provide all the information that the stable
event needs, but the raw event may supply much more. This should
not be a problem, since the tracepoint that represents a stable event
should, by definition, always be stable :-)
Because the stable events piggy back on top of the raw events, the
trace_...() function in the kernel can be used by both. No changes
are needed there. As long as there's already a tracepoint
represented by a raw event, a stable event can be placed on top.
The raw event may change at anytime, as long as it always supplies
the stable event with what is needed. It will require the hooks
between them to be updated. The way tracepoints work, if they become
out of sync, the code will fail to compile.
Time to get out the hose!
-- Steve
The following patches are in:
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
branch: rfc/events
Steven Rostedt (5):
events: Add EVENT_FS the event filesystem
tracing/events: Add code to (un)register stable events
tracing/events: Add infrastructure to show stable event formats
tracing/events: Add stable event sched_switch
tracing/events: Add sched_migrate_task stable event
----
fs/Kconfig | 6 +
fs/Makefile | 1 +
fs/eventfs/Makefile | 4 +
fs/eventfs/file.c | 53 +++++
fs/eventfs/inode.c | 433 ++++++++++++++++++++++++++++++++++++++++++
include/linux/eventfs.h | 83 ++++++++
include/linux/magic.h | 3 +-
include/trace/stable.h | 72 +++++++
include/trace/stable/sched.h | 33 ++++
include/trace/stable_list.h | 3 +
kernel/Makefile | 1 +
kernel/events/Makefile | 1 +
kernel/events/event_format.c | 74 +++++++
kernel/events/event_format.h | 64 ++++++
kernel/events/event_reg.h | 79 ++++++++
kernel/events/events.c | 48 +++++
kernel/trace/Kconfig | 1 +
17 files changed, 958 insertions(+), 1 deletions(-)
^ permalink raw reply [flat|nested] 24+ messages in thread
* [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 0:53 [RFC][PATCH 0/5] tracing/events: stable tracepoints Steven Rostedt
@ 2010-11-17 0:53 ` Steven Rostedt
2010-11-17 3:32 ` Greg KH
2010-11-17 0:53 ` [RFC][PATCH 2/5] [PATCH 2/5] tracing/events: Add code to (un)register stable events Steven Rostedt
` (4 subsequent siblings)
5 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 0:53 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers, Greg KH
[-- Attachment #1: 0001-events-Add-EVENT_FS-the-event-filesystem.patch --]
[-- Type: text/plain, Size: 18802 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
Copied mostly from debugfs, the eventfs is the filesystem
that will include stable tracepoints. Currently nothing
enables this filesystem as of this patch.
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
fs/Kconfig | 6 +
fs/Makefile | 1 +
fs/eventfs/Makefile | 4 +
fs/eventfs/file.c | 53 ++++++
fs/eventfs/inode.c | 433 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/eventfs.h | 83 +++++++++
include/linux/magic.h | 3 +-
7 files changed, 582 insertions(+), 1 deletions(-)
create mode 100644 fs/eventfs/Makefile
create mode 100644 fs/eventfs/file.c
create mode 100644 fs/eventfs/inode.c
create mode 100644 include/linux/eventfs.h
diff --git a/fs/Kconfig b/fs/Kconfig
index 771f457..ffd3a8b 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -250,6 +250,12 @@ source "fs/partitions/Kconfig"
endmenu
endif
+config EVENT_FS
+ bool
+ help
+ The event filesystem, usually mounted at /sys/kernel/events.
+ This option is selected by other configs that require it.
+
source "fs/nls/Kconfig"
source "fs/dlm/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index a7f7cef..4fe02d4 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -116,6 +116,7 @@ obj-$(CONFIG_HOSTFS) += hostfs/
obj-$(CONFIG_HPPFS) += hppfs/
obj-$(CONFIG_CACHEFILES) += cachefiles/
obj-$(CONFIG_DEBUG_FS) += debugfs/
+obj-$(CONFIG_EVENT_FS) += eventfs/
obj-$(CONFIG_OCFS2_FS) += ocfs2/
obj-$(CONFIG_BTRFS_FS) += btrfs/
obj-$(CONFIG_GFS2_FS) += gfs2/
diff --git a/fs/eventfs/Makefile b/fs/eventfs/Makefile
new file mode 100644
index 0000000..5e991a5
--- /dev/null
+++ b/fs/eventfs/Makefile
@@ -0,0 +1,4 @@
+eventfs-objs := inode.o file.o
+
+obj-$(CONFIG_EVENT_FS) += eventfs.o
+
diff --git a/fs/eventfs/file.c b/fs/eventfs/file.c
new file mode 100644
index 0000000..7b4cd64
--- /dev/null
+++ b/fs/eventfs/file.c
@@ -0,0 +1,53 @@
+/*
+ * file.c - part of eventfs, hierarchy structure for stable tracepoints
+ *
+ * Initially copied from debugfs which is:
+ * Copyright (C) 2004 Greg Kroah-Hartman <greg@kroah.com>
+ * Copyright (C) 2004 IBM Inc.
+ *
+ * Conversion to eventfs:
+ * Copyright (C) 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * eventfs is for stable tracepoints. Linus has stated that these tracepoints
+ * should not contain any module tracepoints, thus no function will be
+ * exported.
+ * See Documentation/DocBook/filesystems for more details.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/pagemap.h>
+#include <linux/namei.h>
+#include <linux/eventfs.h>
+
+static ssize_t default_read_file(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ return 0;
+}
+
+static ssize_t default_write_file(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ return count;
+}
+
+static int default_open(struct inode *inode, struct file *file)
+{
+ if (inode->i_private)
+ file->private_data = inode->i_private;
+
+ return 0;
+}
+
+const struct file_operations eventfs_file_operations = {
+ .read = default_read_file,
+ .write = default_write_file,
+ .open = default_open,
+ .llseek = noop_llseek,
+};
diff --git a/fs/eventfs/inode.c b/fs/eventfs/inode.c
new file mode 100644
index 0000000..07b8091
--- /dev/null
+++ b/fs/eventfs/inode.c
@@ -0,0 +1,433 @@
+/*
+ * inode.c - part of eventfs, hierarchy structure for stable tracepoints
+ *
+ * Initially copied from debugfs which is:
+ * Copyright (C) 2004 Greg Kroah-Hartman <greg@kroah.com>
+ * Copyright (C) 2004 IBM Inc.
+ *
+ * Conversion to eventfs:
+ * Copyright (C) 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * eventfs is for stable tracepoints. Linus has stated that these tracepoints
+ * should not contain any module tracepoints, thus no function will be
+ * exported.
+ * See Documentation/DocBook/filesystems for more details.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/pagemap.h>
+#include <linux/init.h>
+#include <linux/kobject.h>
+#include <linux/namei.h>
+#include <linux/eventfs.h>
+#include <linux/fsnotify.h>
+#include <linux/string.h>
+#include <linux/magic.h>
+#include <linux/slab.h>
+
+static struct vfsmount *eventfs_mount;
+static int eventfs_mount_count;
+static bool eventfs_registered;
+
+static struct inode *
+eventfs_get_inode(struct super_block *sb, int mode, dev_t dev,
+ void *data, const struct file_operations *fops)
+
+{
+ struct inode *inode;
+
+ /* links are not currently supported */
+ if ((mode & S_IFMT) == S_IFLNK)
+ return NULL;
+
+ inode = new_inode(sb);
+
+ if (inode) {
+ inode->i_ino = get_next_ino();
+ inode->i_mode = mode;
+ inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+ switch (mode & S_IFMT) {
+ default:
+ init_special_inode(inode, mode, dev);
+ break;
+ case S_IFREG:
+ inode->i_fop = fops ? fops : &eventfs_file_operations;
+ inode->i_private = data;
+ break;
+ case S_IFDIR:
+ inode->i_op = &simple_dir_inode_operations;
+ inode->i_fop = fops ? fops : &simple_dir_operations;
+ inode->i_private = data;
+
+ /* directory inodes start off with i_nlink == 2
+ * (for "." entry) */
+ inc_nlink(inode);
+ break;
+ }
+ }
+ return inode;
+}
+
+/* SMP-safe */
+static int eventfs_mknod(struct inode *dir, struct dentry *dentry,
+ int mode, dev_t dev, void *data,
+ const struct file_operations *fops)
+{
+ struct inode *inode;
+ int error = -EPERM;
+
+ if (dentry->d_inode)
+ return -EEXIST;
+
+ inode = eventfs_get_inode(dir->i_sb, mode, dev, data, fops);
+ if (inode) {
+ d_instantiate(dentry, inode);
+ dget(dentry);
+ error = 0;
+ }
+ return error;
+}
+
+static int eventfs_mkdir(struct inode *dir, struct dentry *dentry, int mode,
+ void *data, const struct file_operations *fops)
+{
+ int res;
+
+ mode = (mode & (S_IRWXUGO | S_ISVTX)) | S_IFDIR;
+ res = eventfs_mknod(dir, dentry, mode, 0, data, fops);
+ if (!res) {
+ inc_nlink(dir);
+ fsnotify_mkdir(dir, dentry);
+ }
+ return res;
+}
+
+static int eventfs_create(struct inode *dir, struct dentry *dentry, int mode,
+ void *data, const struct file_operations *fops)
+{
+ int res;
+
+ mode = (mode & S_IALLUGO) | S_IFREG;
+ res = eventfs_mknod(dir, dentry, mode, 0, data, fops);
+ if (!res)
+ fsnotify_create(dir, dentry);
+ return res;
+}
+
+static inline int eventfs_positive(struct dentry *dentry)
+{
+ return dentry->d_inode && !d_unhashed(dentry);
+}
+
+static int event_fill_super(struct super_block *sb, void *data, int silent)
+{
+ static struct tree_descr event_files[] = { {""} };
+
+ return simple_fill_super(sb, EVENTFS_MAGIC, event_files);
+}
+
+static struct dentry *event_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name,
+ void *data)
+{
+ return mount_single(fs_type, flags, data, event_fill_super);
+}
+
+static struct file_system_type event_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "eventfs",
+ .mount = event_mount,
+ .kill_sb = kill_litter_super,
+};
+
+static int eventfs_create_by_name(const char *name, mode_t mode,
+ struct dentry *parent,
+ struct dentry **dentry,
+ void *data,
+ const struct file_operations *fops)
+{
+ int error = 0;
+
+ /* If the parent is not specified, we create it in the root.
+ * We need the root dentry to do this, which is in the super
+ * block. A pointer to that is in the struct vfsmount that we
+ * have around.
+ */
+ if (!parent)
+ parent = eventfs_mount->mnt_sb->s_root;
+
+ *dentry = NULL;
+ mutex_lock(&parent->d_inode->i_mutex);
+ *dentry = lookup_one_len(name, parent, strlen(name));
+ if (!IS_ERR(*dentry)) {
+ switch (mode & S_IFMT) {
+ case S_IFDIR:
+ error = eventfs_mkdir(parent->d_inode, *dentry, mode,
+ data, fops);
+ break;
+ case S_IFLNK:
+ error = -EINVAL;
+ break;
+ default:
+ error = eventfs_create(parent->d_inode, *dentry, mode,
+ data, fops);
+ break;
+ }
+ dput(*dentry);
+ } else
+ error = PTR_ERR(*dentry);
+ mutex_unlock(&parent->d_inode->i_mutex);
+
+ return error;
+}
+
+/**
+ * eventfs_create_file - create a file in the eventfs filesystem
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: a pointer to the parent dentry for this file. This should be a
+ * directory dentry if set. If this paramater is NULL, then the
+ * file will be created in the root of the eventfs filesystem.
+ * @data: a pointer to something that the caller will want to get to later
+ * on. The inode.i_private pointer will point to this value on
+ * the open() call.
+ * @fops: a pointer to a struct file_operations that should be used for
+ * this file.
+ *
+ * This is the basic "create a file" function for eventfs. It allows for a
+ * wide range of flexibility in creating a file, or a directory (if you want
+ * to create a directory, the eventfs_create_dir() function is
+ * recommended to be used instead.)
+ *
+ * This function will return a pointer to a dentry if it succeeds. This
+ * pointer must be passed to the eventfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.) If an error occurs, %NULL will be returned.
+ *
+ * If eventfs is not enabled in the kernel, the value -%ENODEV will be
+ * returned.
+ */
+struct dentry *eventfs_create_file(const char *name, mode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops)
+{
+ struct dentry *dentry = NULL;
+ int error;
+
+ pr_debug("eventfs: creating file '%s'\n", name);
+
+ error = simple_pin_fs(&event_fs_type, &eventfs_mount,
+ &eventfs_mount_count);
+ if (error)
+ goto exit;
+
+ error = eventfs_create_by_name(name, mode, parent, &dentry,
+ data, fops);
+ if (error) {
+ dentry = NULL;
+ simple_release_fs(&eventfs_mount, &eventfs_mount_count);
+ goto exit;
+ }
+exit:
+ return dentry;
+}
+
+/**
+ * eventfs_create_dir - create a directory in the eventfs filesystem
+ * @name: a pointer to a string containing the name of the directory to
+ * create.
+ * @parent: a pointer to the parent dentry for this file. This should be a
+ * directory dentry if set. If this paramater is NULL, then the
+ * directory will be created in the root of the eventfs filesystem.
+ *
+ * This function creates a directory in eventfs with the given name.
+ *
+ * This function will return a pointer to a dentry if it succeeds. This
+ * pointer must be passed to the eventfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.) If an error occurs, %NULL will be returned.
+ *
+ * If eventfs is not enabled in the kernel, the value -%ENODEV will be
+ * returned.
+ */
+struct dentry *eventfs_create_dir(const char *name, struct dentry *parent)
+{
+ return eventfs_create_file(name,
+ S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
+ parent, NULL, NULL);
+}
+
+static void __eventfs_remove(struct dentry *dentry, struct dentry *parent)
+{
+ int ret = 0;
+
+ if (eventfs_positive(dentry)) {
+ if (dentry->d_inode) {
+ dget(dentry);
+ switch (dentry->d_inode->i_mode & S_IFMT) {
+ case S_IFDIR:
+ ret = simple_rmdir(parent->d_inode, dentry);
+ break;
+ case S_IFLNK:
+ kfree(dentry->d_inode->i_private);
+ /* fall through */
+ default:
+ simple_unlink(parent->d_inode, dentry);
+ break;
+ }
+ if (!ret)
+ d_delete(dentry);
+ dput(dentry);
+ }
+ }
+}
+
+/**
+ * eventfs_remove - removes a file or directory from the eventfs filesystem
+ * @dentry: a pointer to a the dentry of the file or directory to be
+ * removed.
+ *
+ * This function removes a file or directory in eventfs that was previously
+ * created with a call to another eventfs function (like
+ * eventfs_create_file() or variants thereof.)
+ *
+ * This function is required to be called in order for the file to be
+ * removed, no automatic cleanup of files will happen when a module is
+ * removed, you are responsible here.
+ */
+void eventfs_remove(struct dentry *dentry)
+{
+ struct dentry *parent;
+
+ if (!dentry)
+ return;
+
+ parent = dentry->d_parent;
+ if (!parent || !parent->d_inode)
+ return;
+
+ mutex_lock(&parent->d_inode->i_mutex);
+ __eventfs_remove(dentry, parent);
+ mutex_unlock(&parent->d_inode->i_mutex);
+ simple_release_fs(&eventfs_mount, &eventfs_mount_count);
+}
+
+/**
+ * eventfs_remove_recursive - recursively removes a directory
+ * @dentry: a pointer to a the dentry of the directory to be removed.
+ *
+ * This function recursively removes a directory tree in eventfs that
+ * was previously created with a call to another eventfs function
+ * (like eventfs_create_file() or variants thereof.)
+ *
+ * This function is required to be called in order for the file to be
+ * removed, no automatic cleanup of files will happen when a module is
+ * removed, you are responsible here.
+ */
+void eventfs_remove_recursive(struct dentry *dentry)
+{
+ struct dentry *child;
+ struct dentry *parent;
+
+ if (!dentry)
+ return;
+
+ parent = dentry->d_parent;
+ if (!parent || !parent->d_inode)
+ return;
+
+ parent = dentry;
+ mutex_lock(&parent->d_inode->i_mutex);
+
+ while (1) {
+ /*
+ * When all dentries under "parent" has been removed,
+ * walk up the tree until we reach our starting point.
+ */
+ if (list_empty(&parent->d_subdirs)) {
+ mutex_unlock(&parent->d_inode->i_mutex);
+ if (parent == dentry)
+ break;
+ parent = parent->d_parent;
+ mutex_lock(&parent->d_inode->i_mutex);
+ }
+ child = list_entry(parent->d_subdirs.next, struct dentry,
+ d_u.d_child);
+ next_sibling:
+
+ /*
+ * If "child" isn't empty, walk down the tree and
+ * remove all its descendants first.
+ */
+ if (!list_empty(&child->d_subdirs)) {
+ mutex_unlock(&parent->d_inode->i_mutex);
+ parent = child;
+ mutex_lock(&parent->d_inode->i_mutex);
+ continue;
+ }
+ __eventfs_remove(child, parent);
+ if (parent->d_subdirs.next == &child->d_u.d_child) {
+ /*
+ * Try the next sibling.
+ */
+ if (child->d_u.d_child.next != &parent->d_subdirs) {
+ child = list_entry(child->d_u.d_child.next,
+ struct dentry,
+ d_u.d_child);
+ goto next_sibling;
+ }
+
+ /*
+ * Avoid infinite loop if we fail to remove
+ * one dentry.
+ */
+ mutex_unlock(&parent->d_inode->i_mutex);
+ break;
+ }
+ simple_release_fs(&eventfs_mount, &eventfs_mount_count);
+ }
+
+ parent = dentry->d_parent;
+ mutex_lock(&parent->d_inode->i_mutex);
+ __eventfs_remove(dentry, parent);
+ mutex_unlock(&parent->d_inode->i_mutex);
+ simple_release_fs(&eventfs_mount, &eventfs_mount_count);
+}
+
+/**
+ * eventfs_initialized - Tells whether eventfs has been registered
+ */
+bool eventfs_initialized(void)
+{
+ return eventfs_registered;
+}
+
+
+static struct kobject *event_kobj;
+
+static int __init eventfs_init(void)
+{
+ int retval;
+
+ event_kobj = kobject_create_and_add("event", kernel_kobj);
+ if (!event_kobj)
+ return -EINVAL;
+
+ retval = register_filesystem(&event_fs_type);
+ if (retval)
+ kobject_put(event_kobj);
+ else
+ eventfs_registered = true;
+
+ return retval;
+}
+
+core_initcall(eventfs_init);
+
diff --git a/include/linux/eventfs.h b/include/linux/eventfs.h
new file mode 100644
index 0000000..95b5c42
--- /dev/null
+++ b/include/linux/eventfs.h
@@ -0,0 +1,83 @@
+/*
+ * eventfs.h - filesystem for stable events
+ *
+ * Initially copied from debugfs which is:
+ * Copyright (C) 2004 Greg Kroah-Hartman <greg@kroah.com>
+ * Copyright (C) 2004 IBM Inc.
+ *
+ * Conversion to eventfs:
+ * Copyright (C) 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * eventfs is for stable tracepoints. Linus has stated that these tracepoints
+ * should not contain any module tracepoints, thus no function will be
+ * exported.
+ * See Documentation/DocBook/filesystems for more details.
+ */
+
+#ifndef _EVENTFS_H_
+#define _EVENTFS_H_
+
+#include <linux/fs.h>
+
+#include <linux/types.h>
+
+struct file_operations;
+
+#if defined(CONFIG_EVENT_FS)
+
+/* declared over in file.c */
+extern const struct file_operations eventfs_file_operations;
+extern const struct inode_operations eventfs_link_operations;
+
+struct dentry *eventfs_create_file(const char *name, mode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops);
+
+struct dentry *eventfs_create_dir(const char *name, struct dentry *parent);
+
+void eventfs_remove(struct dentry *dentry);
+void eventfs_remove_recursive(struct dentry *dentry);
+
+bool eventfs_initialized(void);
+
+#else
+
+#include <linux/err.h>
+
+/*
+ * We do not return NULL from these functions if CONFIG_EVENT_FS is not enabled
+ * so users have a chance to detect if there was a real error or not. We don't
+ * want to duplicate the design decision mistakes of procfs and devfs again.
+ */
+
+static inline struct dentry *eventfs_create_file(const char *name, mode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops)
+{
+ return ERR_PTR(-ENODEV);
+}
+
+static inline struct dentry *eventfs_create_dir(const char *name,
+ struct dentry *parent)
+{
+ return ERR_PTR(-ENODEV);
+}
+
+static inline void eventfs_remove(struct dentry *dentry)
+{ }
+
+static inline void eventfs_remove_recursive(struct dentry *dentry)
+{ }
+
+static inline bool eventfs_initialized(void)
+{
+ return false;
+}
+
+#endif
+
+#endif
diff --git a/include/linux/magic.h b/include/linux/magic.h
index ff690d0..0126169 100644
--- a/include/linux/magic.h
+++ b/include/linux/magic.h
@@ -8,7 +8,8 @@
#define CODA_SUPER_MAGIC 0x73757245
#define CRAMFS_MAGIC 0x28cd3d45 /* some random number */
#define CRAMFS_MAGIC_WEND 0x453dcd28 /* magic number with the wrong endianess */
-#define DEBUGFS_MAGIC 0x64626720
+#define DEBUGFS_MAGIC 0x64626720
+#define EVENTFS_MAGIC 0x65667310
#define SYSFS_MAGIC 0x62656572
#define SECURITYFS_MAGIC 0x73636673
#define SELINUX_MAGIC 0xf97cff8c
--
1.7.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC][PATCH 2/5] [PATCH 2/5] tracing/events: Add code to (un)register stable events
2010-11-17 0:53 [RFC][PATCH 0/5] tracing/events: stable tracepoints Steven Rostedt
2010-11-17 0:53 ` [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem Steven Rostedt
@ 2010-11-17 0:53 ` Steven Rostedt
2010-11-17 0:54 ` [RFC][PATCH 3/5] [PATCH 3/5] tracing/events: Add infrastructure to show stable event formats Steven Rostedt
` (3 subsequent siblings)
5 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 0:53 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers
[-- Attachment #1: 0002-tracing-events-Add-code-to-un-register-stable-events.patch --]
[-- Type: text/plain, Size: 8089 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
Add the framework to create and register stable events.
To create a stable event, add a file into:
include/trace/stable/myfile.h
With the following format:
STABLE_EVENT(myevent,
EVENT_STRUCT(
__field(type, item)
__array(type, item, count)
)
)
This will create the function prototype:
typedef void (*trace_proto_myevent)(type item, type *item);
And the functions:
int register_stable_trace_myevent(trace_proto_myevent func, void *data);
void unregister_stable_trace_myevent(trace_proto_myevent func, void *data);
This will allow developers to register a callback of a stable event.
The stable events must have a matching raw (in field debugging)
trace event. Although, only the names must match, the information
in the stable event is just a subset of the information that can
be extracted from the raw event.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
include/trace/stable.h | 72 +++++++++++++++++++++++++++++++++++++++
include/trace/stable_list.h | 2 +
kernel/Makefile | 1 +
kernel/events/Makefile | 1 +
kernel/events/event_reg.h | 79 +++++++++++++++++++++++++++++++++++++++++++
kernel/events/events.c | 16 +++++++++
kernel/trace/Kconfig | 1 +
7 files changed, 172 insertions(+), 0 deletions(-)
create mode 100644 include/trace/stable.h
create mode 100644 include/trace/stable_list.h
create mode 100644 kernel/events/Makefile
create mode 100644 kernel/events/event_reg.h
create mode 100644 kernel/events/events.c
diff --git a/include/trace/stable.h b/include/trace/stable.h
new file mode 100644
index 0000000..ee10f41
--- /dev/null
+++ b/include/trace/stable.h
@@ -0,0 +1,72 @@
+#ifndef _TRACE_STABLE_H
+#define _TRACE_STABLE_H
+/*
+ * stable.h - create structure and format for stable events
+ *
+ * Copyright (C) 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc
+ *
+ * This file creates the registering function prototypes
+ * to hook to the stable events and the function prototypes of those
+ * hooks.
+ *
+ * The stable events themselves reside in the directory:
+ * include/trace/stable/
+ *
+ * Each of these files must be added to the header:
+ * include/trace/stable_list.
+ *
+ * Code that registers and unregisters stable events only needs
+ * to include this file:
+ * #include <trace/stable.h>
+ *
+ * and all the stable structures and registering functions will
+ * also be included for all stable events.
+ *
+ * The current elements that can be used by stable events are:
+ *
+ * __field(type, item)
+ * __array(type, time, size)
+ *
+ * Example:
+ * STABLE_EVENT(myevent,
+ * EVENT_STRUCT(
+ * __field( pid_t, pid )
+ * __array( char, comm, TASK_COMM_LEN )
+ * )
+ * )
+ *
+ * The above would create the following:
+ *
+ * typedef void (*trace_proto_myevent)(pid_t pid, char comm[TASK_COMM_LEN]);
+ * int register_stable_trace_myevent(trace_proto_myevent func, void *data);
+ * void unregister_stable_trace_myevent(trace_proto_myevent func, void *data);
+ *
+ * These functions will be created in kernel/events/events.c
+ */
+
+#define STABLE_HEADER_MULTI_READ
+
+#undef __SEP__
+#define __SEP__ ,
+
+#undef __field
+#define __field(type, item) type item
+
+#undef __array
+#define __array(type, item, size) type *item
+
+#undef EVENT_STRUCT
+#define EVENT_STRUCT(s) s
+
+#undef STABLE_EVENT
+#define STABLE_EVENT(name, estruct) \
+ typedef void (*trace_proto_##name)(void *data, estruct); \
+ int register_stable_trace_##name(trace_proto_##name func, void *data); \
+ void unregister_stable_trace_##name(trace_proto_##name func, void *data)
+
+/* New stable defines must be added here */
+#include <trace/stable_list.h>
+
+#undef STABLE_HEADER_MULTI_READ
+
+#endif /* _TRACE_STABLE_H */
diff --git a/include/trace/stable_list.h b/include/trace/stable_list.h
new file mode 100644
index 0000000..996932a
--- /dev/null
+++ b/include/trace/stable_list.h
@@ -0,0 +1,2 @@
+
+/* New stable defines must be added here */
diff --git a/kernel/Makefile b/kernel/Makefile
index 0b5ff08..8c1cbbd 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_event.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
obj-$(CONFIG_PADATA) += padata.o
+obj-y += events/
ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
# According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
new file mode 100644
index 0000000..86fd7c1
--- /dev/null
+++ b/kernel/events/Makefile
@@ -0,0 +1 @@
+obj-y += events.o
diff --git a/kernel/events/event_reg.h b/kernel/events/event_reg.h
new file mode 100644
index 0000000..dfe16df
--- /dev/null
+++ b/kernel/events/event_reg.h
@@ -0,0 +1,79 @@
+/*
+ * event_reg.h - code to register stable events
+ *
+ * Copyright (C) 2010 Steven Rostedt <srostedt@redhat.com> Red Hat Inc
+ *
+ * Create the registration functions for the stable events:
+ *
+ * int register_stable_trace_##name(func, data);
+ * void unregister_stable_stable_trace_##name(func, data);
+ *
+ * The func is of the format based off of the STABLE_EVENT() structure.
+ * For example:
+ *
+ * STABLE_EVENT(myevent,
+ * EVENT_STRUCT(
+ * __field( pid_t, pid )
+ * __array( char, comm, TASK_COMM_LEN )
+ * )
+ * )
+ *
+ * Will yield a function proto type of:
+ *
+ * void func(pid_t pid, char *comm, void *data);
+ *
+ * When registering a call back to a stable event, what ever you
+ * pass into data, will be sent to your function in the data field.
+ *
+ * Once you register it, the function will start being called when
+ * the corresponding tracepoint is hit.
+ *
+ * To stop tracing, just unregister the function. Note, both
+ * registering and unregistering must be called from sleepable
+ * context, since those functions may sleep.
+ */
+
+#define STABLE_HEADER_MULTI_READ
+
+#undef __SEP__
+#define __SEP__
+
+#undef __field
+#define __field(type, item)
+
+#undef __array
+#define __array(type, item, size)
+
+#undef STABLE_EVENT
+#define STABLE_EVENT(name, estruct) \
+static int name##_ref_count; \
+int register_stable_trace_##name(trace_proto_##name func, void *__data) \
+{ \
+ int ret; \
+ \
+ mutex_lock(&stable_event_mutex); \
+ ret = register_trace_stable_##name(func, __data); \
+ if (!ret) \
+ goto out; \
+ \
+ if (name##_ref_count++ == 0) \
+ ret = register_trace_##name(hook_##name, NULL); \
+ \
+ out: \
+ mutex_unlock(&stable_event_mutex); \
+ return ret; \
+} \
+EXPORT_SYMBOL(register_stable_trace_##name); \
+ \
+void unregister_stable_trace_##name(trace_proto_##name func, void *__data) \
+{ \
+ mutex_lock(&stable_event_mutex); \
+ if (--name##_ref_count == 0) \
+ unregister_trace_##name(hook_##name, NULL); \
+ \
+ unregister_trace_stable_##name(func, __data); \
+ mutex_unlock(&stable_event_mutex); \
+} \
+EXPORT_SYMBOL(unregister_stable_trace_##name)
+
+#include <trace/stable_list.h>
diff --git a/kernel/events/events.c b/kernel/events/events.c
new file mode 100644
index 0000000..6868bf1
--- /dev/null
+++ b/kernel/events/events.c
@@ -0,0 +1,16 @@
+/*
+ * events.c - code to register stable events
+ *
+ * Copyright (C) 2010 Steven Rostedt <srostedt@redhat.com> Red Hat Inc
+ *
+ * This file holds the hooks that are required to convert
+ * a raw tracepoing into a stable one. The conversion names
+ * must be named: hook_##tracepoint_name.
+ */
+#include <linux/tracepoint.h>
+#include <linux/module.h>
+#include <trace/stable.h>
+
+static struct mutex stable_event_mutex;
+
+#include "event_reg.h"
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e04b8bc..0533182 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -94,6 +94,7 @@ config TRACING
select NOP_TRACER
select BINARY_PRINTF
select EVENT_TRACING
+ select EVENT_FS
config GENERIC_TRACER
bool
--
1.7.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC][PATCH 3/5] [PATCH 3/5] tracing/events: Add infrastructure to show stable event formats
2010-11-17 0:53 [RFC][PATCH 0/5] tracing/events: stable tracepoints Steven Rostedt
2010-11-17 0:53 ` [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem Steven Rostedt
2010-11-17 0:53 ` [RFC][PATCH 2/5] [PATCH 2/5] tracing/events: Add code to (un)register stable events Steven Rostedt
@ 2010-11-17 0:54 ` Steven Rostedt
2010-11-17 0:54 ` [RFC][PATCH 4/5] [PATCH 4/5] tracing/events: Add stable event sched_switch Steven Rostedt
` (2 subsequent siblings)
5 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 0:54 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers
[-- Attachment #1: 0003-tracing-events-Add-infrastructure-to-show-stable-eve.patch --]
[-- Type: text/plain, Size: 4910 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
Add the infrastructure to connect to the eventfs filesystem and
create a event directory. Currently, all events are flat. That is
there is no hierarchy. Each stable event has its own directory
in /sys/kernel/events (after the eventfs is mounted there).
Currently only one file exists in each of the event directories.
This file is the format file. The format shows each item
in the stable event. The name of that item, the item's type,
the size of the item (in bits), the count of entries of the item
(if that item is an array), the alignment needed for that item to
be read (arch specific) and the sign of the item.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/events/Makefile | 2 +-
kernel/events/event_format.c | 74 ++++++++++++++++++++++++++++++++++++++++++
kernel/events/event_format.h | 64 ++++++++++++++++++++++++++++++++++++
3 files changed, 139 insertions(+), 1 deletions(-)
create mode 100644 kernel/events/event_format.c
create mode 100644 kernel/events/event_format.h
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 86fd7c1..943bda5 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -1 +1 @@
-obj-y += events.o
+obj-y += events.o event_format.o
diff --git a/kernel/events/event_format.c b/kernel/events/event_format.c
new file mode 100644
index 0000000..705948a
--- /dev/null
+++ b/kernel/events/event_format.c
@@ -0,0 +1,74 @@
+/*
+ * event_format.c - code to show events in eventfs
+ *
+ * Copyright (C) 2010 Steven Rostedt <srostedt@redhat.com> Red Hat Inc
+ */
+#include <linux/eventfs.h>
+#include <linux/seq_file.h>
+
+typedef void (*event_format_func)(struct seq_file *m);
+
+static void *f_next(struct seq_file *m, void *v, loff_t *pos)
+{
+
+ if ((*pos)++ == 0)
+ return m->private;
+
+ return NULL;
+}
+
+static void *f_start(struct seq_file *m, loff_t *pos)
+{
+ /* It's all or nothing */
+ if ((*pos)++)
+ return NULL;
+
+ return m->private;
+}
+
+static int f_show(struct seq_file *m, void *v)
+{
+ event_format_func func = m->private;
+
+ if (!v)
+ return 0;
+
+ func(m);
+ return 0;
+}
+
+static void f_stop(struct seq_file *m, void *p)
+{
+}
+
+static const struct seq_operations event_format_seq_ops = {
+ .start = f_start,
+ .next = f_next,
+ .stop = f_stop,
+ .show = f_show,
+};
+
+static int event_format_open(struct inode *inode, struct file *file)
+{
+ struct seq_file *m;
+ int ret;
+
+ ret = seq_open(file, &event_format_seq_ops);
+ if (ret < 0)
+ return ret;
+
+ m = file->private_data;
+ m->private = inode->i_private;
+
+ return 0;
+}
+
+static const struct file_operations event_format_fops = {
+ .open = event_format_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+#include "event_format.h"
+
diff --git a/kernel/events/event_format.h b/kernel/events/event_format.h
new file mode 100644
index 0000000..aa43fc0
--- /dev/null
+++ b/kernel/events/event_format.h
@@ -0,0 +1,64 @@
+/*
+ * event_format.h - code to show events in eventfs
+ *
+ * Copyright (C) 2010 Steven Rostedt <srostedt@redhat.com> Red Hat Inc
+ *
+ * For every stable event a directory is created in the eventfs.
+ * For now, only one file exists in that directory, which is
+ * the format file. This format shows the format of that event as:
+ *
+ * field:name type:type size:bit-size align:alignment signed:signed;
+ *
+ * Or for arrays:
+ *
+ * array:name type:type size:bit-size count:items algin:alignment signed:signed;
+ *
+ * The size is in bits. For the array, the size is of a single entry,
+ * and the count is the number of those entries.
+ */
+#include <linux/ftrace_event.h>
+
+#define STABLE_HEADER_MULTI_READ
+
+#undef __SEP__
+#define __SEP__
+
+#undef __field
+#define __field(type, item) \
+ seq_printf(m, "\tfield:%s\ttype:%s\tsize:%ld\talign:%ld\tsigned:%d;\n", \
+ #item, #type, sizeof(type) * 8, \
+ __alignof__(type), is_signed_type(type));
+
+#undef __array
+#define __array(type, item, size) \
+ seq_printf(m, "\tarray:%s\ttype:%s\tsize:%ld\tcount:%d\talign:%ld\tsigned:%d;\n", \
+ #item, #type, sizeof(type) * 8, size, \
+ __alignof__(type), is_signed_type(type));
+
+#define EVENT_STRUCT(s) s
+
+#undef STABLE_EVENT
+#define STABLE_EVENT(name, estruct) \
+static void format_##name(struct seq_file *m) \
+{ \
+ estruct; \
+} \
+ \
+static int __init create_stable_##name(void) \
+{ \
+ struct dentry *entry; \
+ struct dentry *dir; \
+ \
+ dir = eventfs_create_dir(#name, NULL); \
+ if (WARN(!dir, "Unable to create directory %s", #name)) \
+ return -1; \
+ \
+ entry = eventfs_create_file("format", 0644, dir, format_##name, \
+ &event_format_fops); \
+ \
+ return 0; \
+} \
+ \
+fs_initcall(create_stable_##name);
+
+#include <trace/stable_list.h>
--
1.7.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC][PATCH 4/5] [PATCH 4/5] tracing/events: Add stable event sched_switch
2010-11-17 0:53 [RFC][PATCH 0/5] tracing/events: stable tracepoints Steven Rostedt
` (2 preceding siblings ...)
2010-11-17 0:54 ` [RFC][PATCH 3/5] [PATCH 3/5] tracing/events: Add infrastructure to show stable event formats Steven Rostedt
@ 2010-11-17 0:54 ` Steven Rostedt
2010-11-17 0:54 ` [RFC][PATCH 5/5] [PATCH 5/5] tracing/events: Add sched_migrate_task stable event Steven Rostedt
2010-11-17 20:14 ` [RFC][PATCH 0/5] tracing/events: stable tracepoints Mathieu Desnoyers
5 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 0:54 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers
[-- Attachment #1: 0004-tracing-events-Add-stable-event-sched_switch.patch --]
[-- Type: text/plain, Size: 2628 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
Add the stable event for sched_switch.
[root@bxf ~]# cat /sys/kernel/event/sched_switch/format
array:prev_comm type:char size:8 count:16 align:1 signed:1;
field:prev_pid type:pid_t size:32 align:4 signed:1;
field:prev_state type:char size:8 align:1 signed:1;
array:next_comm type:char size:8 count:16 align:1 signed:1;
field:next_pid type:pid_t size:32 align:4 signed:1;
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
include/trace/stable/sched.h | 20 ++++++++++++++++++++
include/trace/stable_list.h | 1 +
kernel/events/events.c | 22 ++++++++++++++++++++++
3 files changed, 43 insertions(+), 0 deletions(-)
create mode 100644 include/trace/stable/sched.h
diff --git a/include/trace/stable/sched.h b/include/trace/stable/sched.h
new file mode 100644
index 0000000..b5f4fd7
--- /dev/null
+++ b/include/trace/stable/sched.h
@@ -0,0 +1,20 @@
+#if !defined(_STABLE_SCHED_H) || defined(STABLE_HEADER_MULTI_READ)
+#define _STABLE_SCHED_H
+
+#include <linux/sched.h>
+
+/*
+ * Tracepoint for task switches, performed by the scheduler:
+ */
+STABLE_EVENT(sched_switch,
+
+ EVENT_STRUCT(
+ __array( char, prev_comm, TASK_COMM_LEN )__SEP__
+ __field( pid_t, prev_pid )__SEP__
+ __field( char, prev_state )__SEP__
+ __array( char, next_comm, TASK_COMM_LEN )__SEP__
+ __field( pid_t, next_pid )
+ )
+);
+
+#endif /* _STABLE_SCHED_H */
diff --git a/include/trace/stable_list.h b/include/trace/stable_list.h
index 996932a..9cbc006 100644
--- a/include/trace/stable_list.h
+++ b/include/trace/stable_list.h
@@ -1,2 +1,3 @@
/* New stable defines must be added here */
+#include <trace/stable/sched.h>
diff --git a/kernel/events/events.c b/kernel/events/events.c
index 6868bf1..f69e720 100644
--- a/kernel/events/events.c
+++ b/kernel/events/events.c
@@ -13,4 +13,26 @@
static struct mutex stable_event_mutex;
+#include <trace/events/sched.h>
+
+DECLARE_TRACE(stable_sched_switch,
+ TP_PROTO(char *prev_comm, pid_t prev_pid, char prev_state,
+ char *next_comm, pid_t next_pid),
+ TP_ARGS(prev_comm, prev_pid, prev_state, next_comm, next_pid));
+DEFINE_TRACE(stable_sched_switch);
+
+static const char stat_nam[] = TASK_STATE_TO_CHAR_STR;
+static void hook_sched_switch(void *ignore,
+ struct task_struct *prev,
+ struct task_struct *next)
+{
+ unsigned state;
+
+ state = prev->state ? __ffs(prev->state) + 1 : 0;
+ state = state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?';
+
+ trace_stable_sched_switch(prev->comm, prev->pid, state,
+ next->comm, next->pid);
+}
+
#include "event_reg.h"
--
1.7.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC][PATCH 5/5] [PATCH 5/5] tracing/events: Add sched_migrate_task stable event
2010-11-17 0:53 [RFC][PATCH 0/5] tracing/events: stable tracepoints Steven Rostedt
` (3 preceding siblings ...)
2010-11-17 0:54 ` [RFC][PATCH 4/5] [PATCH 4/5] tracing/events: Add stable event sched_switch Steven Rostedt
@ 2010-11-17 0:54 ` Steven Rostedt
2010-11-17 20:14 ` [RFC][PATCH 0/5] tracing/events: stable tracepoints Mathieu Desnoyers
5 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 0:54 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers
[-- Attachment #1: 0005-tracing-events-Add-sched_migrate_task-stable-event.patch --]
[-- Type: text/plain, Size: 1743 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
Add the stable event for sched_migrate_task.
[root@bxf ~]# cat /sys/kernel/event/sched_migrate_task/format
array:comm type:char size:8 count:16 align:1 signed:1;
field:pid type:pid_t size:32 align:4 signed:1;
field:orig_cpu type:int size:32 align:4 signed:1;
field:dest_cpu type:int size:32 align:4 signed:1;
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
include/trace/stable/sched.h | 13 +++++++++++++
kernel/events/events.c | 10 ++++++++++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/include/trace/stable/sched.h b/include/trace/stable/sched.h
index b5f4fd7..55d0e6f 100644
--- a/include/trace/stable/sched.h
+++ b/include/trace/stable/sched.h
@@ -17,4 +17,17 @@ STABLE_EVENT(sched_switch,
)
);
+/*
+ * Tracepoint for a task being migrated:
+ */
+STABLE_EVENT(sched_migrate_task,
+
+ EVENT_STRUCT(
+ __array( char, comm, TASK_COMM_LEN )__SEP__
+ __field( pid_t, pid )__SEP__
+ __field( int, orig_cpu )__SEP__
+ __field( int, dest_cpu )
+ )
+);
+
#endif /* _STABLE_SCHED_H */
diff --git a/kernel/events/events.c b/kernel/events/events.c
index f69e720..6ca2d2d 100644
--- a/kernel/events/events.c
+++ b/kernel/events/events.c
@@ -35,4 +35,14 @@ static void hook_sched_switch(void *ignore,
next->comm, next->pid);
}
+DECLARE_TRACE(stable_sched_migrate_task,
+ TP_PROTO(char *comm, pid_t pid, int orig_cpu, int dest_cpu),
+ TP_ARGS(comm, pid, orig_cpu, dest_cpu));
+DEFINE_TRACE(stable_sched_migrate_task);
+
+static void hook_sched_migrate_task(void *ignore, struct task_struct *p, int dest_cpu)
+{
+ trace_stable_sched_migrate_task(p->comm, p->pid, task_cpu(p), dest_cpu);
+}
+
#include "event_reg.h"
--
1.7.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 0:53 ` [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem Steven Rostedt
@ 2010-11-17 3:32 ` Greg KH
2010-11-17 10:39 ` Ingo Molnar
2010-11-17 12:16 ` Steven Rostedt
0 siblings, 2 replies; 24+ messages in thread
From: Greg KH @ 2010-11-17 3:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers
On Tue, Nov 16, 2010 at 07:53:58PM -0500, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
>
> Copied mostly from debugfs, the eventfs is the filesystem
> that will include stable tracepoints. Currently nothing
> enables this filesystem as of this patch.
What? Wait, I wrote tracefs a long time ago just for this, why not take
that code and use it instead?
{sigh} And I just deleted that old tracefs git tree today as I thought
that idea was long gone and dead....
>
> Cc: Greg KH <gregkh@suse.de>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> fs/Kconfig | 6 +
> fs/Makefile | 1 +
> fs/eventfs/Makefile | 4 +
> fs/eventfs/file.c | 53 ++++++
> fs/eventfs/inode.c | 433 +++++++++++++++++++++++++++++++++++++++++++++++
This seems a bit big, I don't think you need all of this for some
reason. Are you sure you can't make it smaller?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 3:32 ` Greg KH
@ 2010-11-17 10:39 ` Ingo Molnar
2010-11-17 12:25 ` Steven Rostedt
2010-11-17 12:16 ` Steven Rostedt
1 sibling, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2010-11-17 10:39 UTC (permalink / raw)
To: Greg KH
Cc: Steven Rostedt, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra
* Greg KH <gregkh@suse.de> wrote:
> On Tue, Nov 16, 2010 at 07:53:58PM -0500, Steven Rostedt wrote:
> > From: Steven Rostedt <srostedt@redhat.com>
> >
> > Copied mostly from debugfs, the eventfs is the filesystem that will include
> > stable tracepoints. Currently nothing enables this filesystem as of this patch.
>
> What? Wait, I wrote tracefs a long time ago just for this, why not take that code
> and use it instead?
Yeah, and i know that i suggested 'eventfs' to Steve and others in a prior thread a
few months ago - and i suspect Steve was following up on that suggestion with this
patch? So i guess it's partly my fault ;-)
[ Also, i think our _real_ problems with tracing lie entirely elsewhere, but i've
explained that numerous times. Maintaining instrumentation bits is the ultimate
cat herding experience ;-) ]
I also explained it in that eventfs suggestion thread that eventfs (or, indeed
tracefs) is IMO only a second tier approach compared to the real thing: proper
enumeration of events in sysfs.
[ Beyond the obvious compatibility detail that we are _NOT_ getting rid of
/debug/tracing/events/, as existing tooling depends on it. So unless eventfs or
sysfs integration brings some real tangible benefits over what we have already we
dont want to force tooling to migrate to yet another API. ]
Lin Ming and PeterZ are working on sysfs integration and they have posted several
iterations of that work which extends event details to sysfs. That work is not
complete yet and they need help. (I've Cc:-ed them.)
The sysfs approach has numerous upsides:
- Design: sysfs is a mature, multi-year project with tons of meaningful hardware
and software hieararchies already well established. Attaching events to these
existing nodes optionally is an obvious advantage and avoids duplication and
forces people to think about structure.
- Concentration of structure: subsystem and driver authors/maintainers already care
about their sysfs layout - and when they define new tracepoints for subsystem or
driver instrumentation it would be very natural for those events to go somewhere
nearby, in the existing sysfs hieararchy.
- Practicalities: sysfs is already mounted on all distros so tooling could rely on
it universally. It's the ultimate 'describe system structure' store.
- Long term maintenance: we want to be strict with events, i.e. keep the
descriptors read only and single-line structured. You sysfs folks are enforcing
that pretty well - with eventfs we'd always have the nasty lure to apply API
hacks to eventfs components when we really shouldnt ...
Eventfs has a couple of downsides:
- Design: it's slapping events into a separate, partly duplicated, partly unique,
partly inconsistent set of hierarchies. We can deal with it, but it's not
particularly intelligent and i'd like us to try harder.
- Practicalities: eventfs has to be mounted on every distro. It's an uphill climb
in general and the appeal of an approach has to be _strong_ for this to be
feasible.
So putting it into sysfs looks like a pretty intelligent solution all around and i'd
prefer it.
Steve, would you be interested in helping out Lin Ming and PeterZ with the sysfs
work - or at least help them come to the conclusion that we want eventfs?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 3:32 ` Greg KH
2010-11-17 10:39 ` Ingo Molnar
@ 2010-11-17 12:16 ` Steven Rostedt
1 sibling, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 12:16 UTC (permalink / raw)
To: Greg KH
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers
On Tue, 2010-11-16 at 19:32 -0800, Greg KH wrote:
> On Tue, Nov 16, 2010 at 07:53:58PM -0500, Steven Rostedt wrote:
> > From: Steven Rostedt <srostedt@redhat.com>
> >
> > Copied mostly from debugfs, the eventfs is the filesystem
> > that will include stable tracepoints. Currently nothing
> > enables this filesystem as of this patch.
>
> What? Wait, I wrote tracefs a long time ago just for this, why not take
> that code and use it instead?
Because:
1) I couldn't find it (I thought you called it eventfs, so I was
searching for the wrong thing).
2) You never answered your ping on IRC
3) I was in a rush to get something out.
;-)
I'm perfectly fine in swapping this out. This is only an RFC anyway.
This is also why I Cc'd you on this patch.
>
> {sigh} And I just deleted that old tracefs git tree today as I thought
> that idea was long gone and dead....
{sigh} and it may still be dead :-(
-- Steve
>
> >
> > Cc: Greg KH <gregkh@suse.de>
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> > fs/Kconfig | 6 +
> > fs/Makefile | 1 +
> > fs/eventfs/Makefile | 4 +
> > fs/eventfs/file.c | 53 ++++++
> > fs/eventfs/inode.c | 433 +++++++++++++++++++++++++++++++++++++++++++++++
>
> This seems a bit big, I don't think you need all of this for some
> reason. Are you sure you can't make it smaller?
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 10:39 ` Ingo Molnar
@ 2010-11-17 12:25 ` Steven Rostedt
2010-11-17 15:03 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 12:25 UTC (permalink / raw)
To: Ingo Molnar
Cc: Greg KH, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra
On Wed, 2010-11-17 at 11:39 +0100, Ingo Molnar wrote:
> * Greg KH <gregkh@suse.de> wrote:
>
> > On Tue, Nov 16, 2010 at 07:53:58PM -0500, Steven Rostedt wrote:
> > > From: Steven Rostedt <srostedt@redhat.com>
> > >
> > > Copied mostly from debugfs, the eventfs is the filesystem that will include
> > > stable tracepoints. Currently nothing enables this filesystem as of this patch.
> >
> > What? Wait, I wrote tracefs a long time ago just for this, why not take that code
> > and use it instead?
>
> Yeah, and i know that i suggested 'eventfs' to Steve and others in a prior thread a
> few months ago - and i suspect Steve was following up on that suggestion with this
> patch? So i guess it's partly my fault ;-)
And we brought this up at Kernel Summit.
>
> [ Also, i think our _real_ problems with tracing lie entirely elsewhere, but i've
> explained that numerous times. Maintaining instrumentation bits is the ultimate
> cat herding experience ;-) ]
>
> I also explained it in that eventfs suggestion thread that eventfs (or, indeed
> tracefs) is IMO only a second tier approach compared to the real thing: proper
> enumeration of events in sysfs.
>
> [ Beyond the obvious compatibility detail that we are _NOT_ getting rid of
> /debug/tracing/events/, as existing tooling depends on it. So unless eventfs or
> sysfs integration brings some real tangible benefits over what we have already we
> dont want to force tooling to migrate to yet another API. ]
One benefit is that we have a way to distinguish between
in-field-debugging tracepoints and tracepoints that are only for
analysis tools.
>
> Lin Ming and PeterZ are working on sysfs integration and they have posted several
> iterations of that work which extends event details to sysfs. That work is not
> complete yet and they need help. (I've Cc:-ed them.)
Actually, I suck at adding anything to the sysfs/kobject code. I always
screw it up. I only got the /sys/kernel/events working because I copied
it directly from Greg. I doubt I'd be much help.
>
> The sysfs approach has numerous upsides:
>
> - Design: sysfs is a mature, multi-year project with tons of meaningful hardware
> and software hieararchies already well established. Attaching events to these
> existing nodes optionally is an obvious advantage and avoids duplication and
> forces people to think about structure.
>
> - Concentration of structure: subsystem and driver authors/maintainers already care
> about their sysfs layout - and when they define new tracepoints for subsystem or
> driver instrumentation it would be very natural for those events to go somewhere
> nearby, in the existing sysfs hieararchy.
>
> - Practicalities: sysfs is already mounted on all distros so tooling could rely on
> it universally. It's the ultimate 'describe system structure' store.
>
> - Long term maintenance: we want to be strict with events, i.e. keep the
> descriptors read only and single-line structured. You sysfs folks are enforcing
> that pretty well - with eventfs we'd always have the nasty lure to apply API
> hacks to eventfs components when we really shouldnt ...
Are these events now going to be labeled as stable? Is every tracepoint
we have, much have the same data? Linus specifically said at Kernel
Summit that he wants absolutely NO modules to have a stable tracepoint.
Also, if we just blindly label a tracepoint as "stable" then we must
keep all its contents. For example, the sched_switch will contain the
priority. As Peter has stated several times, that may go away. We also
do not want to lose getting that information, as a lot of us use it.
>
> Eventfs has a couple of downsides:
>
> - Design: it's slapping events into a separate, partly duplicated, partly unique,
> partly inconsistent set of hierarchies. We can deal with it, but it's not
> particularly intelligent and i'd like us to try harder.
>
> - Practicalities: eventfs has to be mounted on every distro. It's an uphill climb
> in general and the appeal of an approach has to be _strong_ for this to be
> feasible.
Some distros already mount debugfs by default. It's a oneliner in fstab.
>
> So putting it into sysfs looks like a pretty intelligent solution all around and i'd
> prefer it.
Another downside is that you need to scan hundreds of directories to
find tracepoints. And again, are they all now stable?
>
> Steve, would you be interested in helping out Lin Ming and PeterZ with the sysfs
> work - or at least help them come to the conclusion that we want eventfs?
I don't think I would be much help with the former, and I'm thinking I'm
losing the later.
Hmm, seems that every decision that we came to agreement with at Kernel
Summit has been declined in practice. Makes me think that Kernel Summit
is pointless, and was a waste of my time. :-(
-- Steve
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 12:25 ` Steven Rostedt
@ 2010-11-17 15:03 ` Ingo Molnar
2010-11-17 15:16 ` Peter Zijlstra
` (2 more replies)
2010-11-17 15:16 ` Peter Zijlstra
2010-11-17 17:46 ` Mathieu Desnoyers
2 siblings, 3 replies; 24+ messages in thread
From: Ingo Molnar @ 2010-11-17 15:03 UTC (permalink / raw)
To: Steven Rostedt
Cc: Greg KH, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra
* Steven Rostedt <rostedt@goodmis.org> wrote:
> Are these events now going to be labeled as stable? Is every tracepoint we have,
> much have the same data? Linus specifically said at Kernel Summit that he wants
> absolutely NO modules to have a stable tracepoint.
I think you are worrying about the wrong things.
I think Arjan's complaints at the KS stemmed from prior sporadic declarations on
lkml that there is no tracepoint ABI _at all_, and that powertop/latencytop could
break anytime.
But in reality i strongly disagree with such declarations, and tracepoint data that
is used by PowerTop/timechart/latencytop or perf is and was an ABI, simple as that -
and i've been enforcing that for two years. (We have so few good instrumentation
tools that we _really_ dont want to break them.)
At that point, realizing that we have an ABI for existing tools, i think it's
fundamentally misguided to go out on a limb trying to put barriers in the way of
other tools that do not even exist to begin with ...
Our real problem with tracing is lack of relevance, lack of utility, lack of
punch-through analytical power.
Trying to create a sandbox to _reduce utility_ is like the last step, and a really
optional step, when we have such variety that we want some control over it. It's
always expensive, it always reduces the tool space as collateral damage.
So please dont think of sysfs or eventfs as a tool to restrict. Think of it as a
tool to _organize_.
Again, i'd _LOVE_ to have the 'problem' of us having so many tools that analyze
application and kernel behavior in such a rich way that they use tracepoints that
were not supposed to be 'stable'.
I simply dont see the 'problem' that is being solved here. We had a stable ABI and
we didnt break sysprof or powertop/latencytop in the past and wont break it in the
future either.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 12:25 ` Steven Rostedt
2010-11-17 15:03 ` Ingo Molnar
@ 2010-11-17 15:16 ` Peter Zijlstra
2010-11-23 21:29 ` Steven Rostedt
2010-11-17 17:46 ` Mathieu Desnoyers
2 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2010-11-17 15:16 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Greg KH, linux-kernel, Andrew Morton,
Thomas Gleixner, Frederic Weisbecker, Linus Torvalds,
Theodore Tso, Arjan van de Ven, Mathieu Desnoyers, Lin Ming,
Arnaldo Carvalho de Melo
On Wed, 2010-11-17 at 07:25 -0500, Steven Rostedt wrote:
> On Wed, 2010-11-17 at 11:39 +0100, Ingo Molnar wrote:
> > I also explained it in that eventfs suggestion thread that eventfs (or, indeed
> > tracefs) is IMO only a second tier approach compared to the real thing: proper
> > enumeration of events in sysfs.
> >
> > [ Beyond the obvious compatibility detail that we are _NOT_ getting rid of
> > /debug/tracing/events/, as existing tooling depends on it. So unless eventfs or
> > sysfs integration brings some real tangible benefits over what we have already we
> > dont want to force tooling to migrate to yet another API. ]
>
> One benefit is that we have a way to distinguish between
> in-field-debugging tracepoints and tracepoints that are only for
> analysis tools.
Right, and there was a clear consensus that that was a desired thing,
have a strong signal towards tool authors to request 'stable'
tracepoints or live with the fact they have to cope with the thing
changing.
As to the whole /debug/tracing/events thing, we can deprecate it like we
have done so many interfaces, provided we have indeed provided an
alternative interface.
> > Lin Ming and PeterZ are working on sysfs integration and they have posted several
> > iterations of that work which extends event details to sysfs. That work is not
> > complete yet and they need help. (I've Cc:-ed them.)
>
> Actually, I suck at adding anything to the sysfs/kobject code. I always
> screw it up. I only got the /sys/kernel/events working because I copied
> it directly from Greg. I doubt I'd be much help.
Yeah, I suck at it too, sysfs is scary. Luckily Kay is helping me out.
> Are these events now going to be labeled as stable?
For now I'm concentrating on the hardware events, those are more or less
stable in that if you run it on the same hardware you get the same thing
counted. Different hardware may miss some events since it simply cannot
provide anything resembling, or count something related but slightly
different (speculative vs retired events, or different cache level, or a
slightly different definition of miss etc..)
Basically the same we already have with the perf 'generic' events.
> Is every tracepoint
> we have, much have the same data? Linus specifically said at Kernel
> Summit that he wants absolutely NO modules to have a stable tracepoint.
>
> Also, if we just blindly label a tracepoint as "stable" then we must
> keep all its contents. For example, the sched_switch will contain the
> priority. As Peter has stated several times, that may go away. We also
> do not want to lose getting that information, as a lot of us use it.
For now I've not at all looked at representing tracepoints in sysfs, for
the hardware bits I'm looking to place the event_source (formerly known
as PMU) in the hardware topology already present in /sys/devices/.
One suggestion was to place the software and tracepoint events
in /sys/kernel/ some place. Another was to place driver specific, say
wifi things near the wifi driver node.
None of the proposals have dealt with the stable vs debug thing, simply
because none of them were post KS.
> Some distros already mount debugfs by default. It's a oneliner in fstab.
I haven't yet seen that.. maybe its automounted on /sys/kernel/debug/ or
some daft place like that by magic initscipts outside of fstab but I'd
not notice that.
> >
> > So putting it into sysfs looks like a pretty intelligent solution all around and i'd
> > prefer it.
>
> Another downside is that you need to scan hundreds of directories to
> find tracepoints. And again, are they all now stable?
Not really, they'd be accessible through the bus structure, something
like:
/sys/bus/event_source/*/events/*
Sure, that's more than 1 directory, but then so
is /debug/tracing/events/*/*/
> > Steve, would you be interested in helping out Lin Ming and PeterZ with the sysfs
> > work - or at least help them come to the conclusion that we want eventfs?
>
> I don't think I would be much help with the former, and I'm thinking I'm
> losing the later.
Yeah, eventfs simply won't work for what we want to do with hardware
events.
> Hmm, seems that every decision that we came to agreement with at Kernel
> Summit has been declined in practice. Makes me think that Kernel Summit
> is pointless, and was a waste of my time. :-(
Well, I don't know, clearly Ingo seems to disagree, but then he wasn't
at KS. Thomas and me were, and neither of us really see a problem with
the stable vs debug things (at least, I didn't hear tglx protest).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 15:03 ` Ingo Molnar
@ 2010-11-17 15:16 ` Peter Zijlstra
2010-11-17 15:16 ` Steven Rostedt
2010-11-17 18:42 ` Ted Ts'o
2 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2010-11-17 15:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Rostedt, Greg KH, linux-kernel, Andrew Morton,
Thomas Gleixner, Frederic Weisbecker, Linus Torvalds,
Theodore Tso, Arjan van de Ven, Mathieu Desnoyers, Lin Ming,
Arnaldo Carvalho de Melo
On Wed, 2010-11-17 at 16:03 +0100, Ingo Molnar wrote:
> I think Arjan's complaints at the KS stemmed from prior sporadic declarations on
> lkml that there is no tracepoint ABI _at all_, and that powertop/latencytop could
> break anytime.
And it will, afaik Arjan refused to even parse the format file which is
part of the tracepoint abi and I'll be changing those for the scheduler.
I really object to not being able to make sane changes just because some
tool is too lazy to even implement the full ABI that was exposed.
> I think Arjan's complaints at the KS stemmed from prior sporadic declarations on
> lkml that there is no tracepoint ABI _at all_, and that powertop/latencytop could
> break anytime.
I fully intent to break powertop/latencytop if they refuse to use the
format file, deal with it.
Also, in the unlikely event we need to re-order the task->state bits
I'll do so without a moments hesitation, regardless of who consumes them
through the scheduler tracepoints, that's simply not stuff that should
be tied down.
The same for anything that tries to interpret task->prio through the
tracepoints.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 15:03 ` Ingo Molnar
2010-11-17 15:16 ` Peter Zijlstra
@ 2010-11-17 15:16 ` Steven Rostedt
2010-11-17 15:35 ` Peter Zijlstra
2010-11-17 18:42 ` Ted Ts'o
2 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 15:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: Greg KH, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven, Mathieu Desnoyers, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra, Christoph Hellwig
On Wed, 2010-11-17 at 16:03 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Are these events now going to be labeled as stable? Is every tracepoint we have,
> > much have the same data? Linus specifically said at Kernel Summit that he wants
> > absolutely NO modules to have a stable tracepoint.
>
> I think you are worrying about the wrong things.
>
> I think Arjan's complaints at the KS stemmed from prior sporadic declarations on
> lkml that there is no tracepoint ABI _at all_, and that powertop/latencytop could
> break anytime.
>
> But in reality i strongly disagree with such declarations, and tracepoint data that
> is used by PowerTop/timechart/latencytop or perf is and was an ABI, simple as that -
> and i've been enforcing that for two years. (We have so few good instrumentation
> tools that we _really_ dont want to break them.)
>
> At that point, realizing that we have an ABI for existing tools, i think it's
> fundamentally misguided to go out on a limb trying to put barriers in the way of
> other tools that do not even exist to begin with ...
>
> Our real problem with tracing is lack of relevance, lack of utility, lack of
> punch-through analytical power.
>
> Trying to create a sandbox to _reduce utility_ is like the last step, and a really
> optional step, when we have such variety that we want some control over it. It's
> always expensive, it always reduces the tool space as collateral damage.
>
> So please dont think of sysfs or eventfs as a tool to restrict. Think of it as a
> tool to _organize_.
>
> Again, i'd _LOVE_ to have the 'problem' of us having so many tools that analyze
> application and kernel behavior in such a rich way that they use tracepoints that
> were not supposed to be 'stable'.
>
> I simply dont see the 'problem' that is being solved here. We had a stable ABI and
> we didnt break sysprof or powertop/latencytop in the past and wont break it in the
> future either.
What about a tool that picks up tracepoints that were only used by a
developer for in-field debugging, and then that tracepoint disappears
because of a design change. Is it OK for that tool to break with it?
Do all tools that use tracepoints require a "check" feature?
I guess the problem is that creators of the tools to analyze the kernel
have no idea of what they can count on and what they can't. Do we need a
process to have these tool creators request to developers to "keep this
tracepoint"?
-- Steve
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 15:16 ` Steven Rostedt
@ 2010-11-17 15:35 ` Peter Zijlstra
0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2010-11-17 15:35 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Greg KH, linux-kernel, Andrew Morton,
Thomas Gleixner, Frederic Weisbecker, Linus Torvalds,
Theodore Tso, Arjan van de Ven, Mathieu Desnoyers, Lin Ming,
Arnaldo Carvalho de Melo, Christoph Hellwig
On Wed, 2010-11-17 at 10:16 -0500, Steven Rostedt wrote:
> What about a tool that picks up tracepoints that were only used by a
> developer for in-field debugging, and then that tracepoint disappears
> because of a design change. Is it OK for that tool to break with it?
>
> Do all tools that use tracepoints require a "check" feature?
Not sure what you mean with a 'check' feature, but I do think its useful
to tools authors to clearly delineate between stable and debug
tracepoints, that also facilitates
> I guess the problem is that creators of the tools to analyze the kernel
> have no idea of what they can count on and what they can't. Do we need a
> process to have these tool creators request to developers to "keep this
> tracepoint"?
the process mentioned here, if they cannot find the data they want
through existing stable tracepoints they have to request it, or
otherwise clearly suffer breakage whenever we feel like changing stuff.
A nice aspect of devs coming to us and requesting data is that we get a
fairly good idea of what tools are available and get to discuss the
problem they're trying to solve.
Esp. that latter point is a very good one imho, we might have a totally
different view on some of the problems :-)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 12:25 ` Steven Rostedt
2010-11-17 15:03 ` Ingo Molnar
2010-11-17 15:16 ` Peter Zijlstra
@ 2010-11-17 17:46 ` Mathieu Desnoyers
2010-11-17 17:52 ` Steven Rostedt
2 siblings, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2010-11-17 17:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Greg KH, linux-kernel, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
Linus Torvalds, Theodore Tso, Arjan van de Ven, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra
* Steven Rostedt (rostedt@goodmis.org) wrote:
[...]
> Are these events now going to be labeled as stable? Is every tracepoint
> we have, much have the same data? Linus specifically said at Kernel
> Summit that he wants absolutely NO modules to have a stable tracepoint.
I'd like to bring up the point of KVM tracepoints here. KVM can be configured as
a module, and may clearly contain tracepoints that we'd like to be stable.
My thought is that what we really want to enforce is "no stable tracepoints in
drivers" rather than in "modules", but I might be wrong.
Thoughts ?
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 17:46 ` Mathieu Desnoyers
@ 2010-11-17 17:52 ` Steven Rostedt
2010-11-17 18:12 ` Mathieu Desnoyers
2010-11-17 23:48 ` Ted Ts'o
0 siblings, 2 replies; 24+ messages in thread
From: Steven Rostedt @ 2010-11-17 17:52 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Ingo Molnar, Greg KH, linux-kernel, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
Linus Torvalds, Theodore Tso, Arjan van de Ven, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra
On Wed, 2010-11-17 at 12:46 -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> [...]
> > Are these events now going to be labeled as stable? Is every tracepoint
> > we have, much have the same data? Linus specifically said at Kernel
> > Summit that he wants absolutely NO modules to have a stable tracepoint.
>
> I'd like to bring up the point of KVM tracepoints here. KVM can be configured as
> a module, and may clearly contain tracepoints that we'd like to be stable.
>
> My thought is that what we really want to enforce is "no stable tracepoints in
> drivers" rather than in "modules", but I might be wrong.
>
> Thoughts ?
I still say no to stable tracepoints in modules. Once you open that
door, everyone will have it.
But, that doesn't mean that a raw traepoint can't be stable. If the
maintainer of that tracepoint states it is stable, then by all means,
let tools use it.
-- Steve
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 17:52 ` Steven Rostedt
@ 2010-11-17 18:12 ` Mathieu Desnoyers
2010-11-18 9:42 ` Avi Kivity
2010-11-17 23:48 ` Ted Ts'o
1 sibling, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2010-11-17 18:12 UTC (permalink / raw)
To: Steven Rostedt, Avi Kivity
Cc: Ingo Molnar, Greg KH, linux-kernel, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
Linus Torvalds, Theodore Tso, Arjan van de Ven, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra
* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Wed, 2010-11-17 at 12:46 -0500, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@goodmis.org) wrote:
> > [...]
> > > Are these events now going to be labeled as stable? Is every tracepoint
> > > we have, much have the same data? Linus specifically said at Kernel
> > > Summit that he wants absolutely NO modules to have a stable tracepoint.
> >
> > I'd like to bring up the point of KVM tracepoints here. KVM can be configured as
> > a module, and may clearly contain tracepoints that we'd like to be stable.
> >
> > My thought is that what we really want to enforce is "no stable tracepoints in
> > drivers" rather than in "modules", but I might be wrong.
> >
> > Thoughts ?
>
> I still say no to stable tracepoints in modules. Once you open that
> door, everyone will have it.
>
> But, that doesn't mean that a raw traepoint can't be stable. If the
> maintainer of that tracepoint states it is stable, then by all means,
> let tools use it.
I'd really like to hear Avi's thoughts on this.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 15:03 ` Ingo Molnar
2010-11-17 15:16 ` Peter Zijlstra
2010-11-17 15:16 ` Steven Rostedt
@ 2010-11-17 18:42 ` Ted Ts'o
2 siblings, 0 replies; 24+ messages in thread
From: Ted Ts'o @ 2010-11-17 18:42 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Rostedt, Greg KH, linux-kernel, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
Linus Torvalds, Arjan van de Ven, Mathieu Desnoyers, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra
On Wed, Nov 17, 2010 at 04:03:32PM +0100, Ingo Molnar wrote:
>
> I think Arjan's complaints at the KS stemmed from prior sporadic
> declarations on lkml that there is no tracepoint ABI _at all_, and
> that powertop/latencytop could break anytime.
>
> But in reality i strongly disagree with such declarations, and
> tracepoint data that is used by PowerTop/timechart/latencytop or
> perf is and was an ABI, simple as that - and i've been enforcing
> that for two years. (We have so few good instrumentation tools that
> we _really_ dont want to break them.)
There was general consensus at the kernel summit that the tracepoints,
being deliberately in /sys/kernel/debug was unstable and today exposes
internal implementation details --- and that that people putting
different traceponts in Documentation/ABI/{testing,stable,unstable}
simply doesn't work.
Heck, I've recently had to make changes to tracepoints because
otherwise perf would fall over dead when it tripped over an ext4
tracepoint, due to its limitations dealing with what we could put into
TP_PRINTK(). Saying that tracepoints are stable ABI that must never
change, forever and ever, amen, is simply a non-starter.
> I simply dont see the 'problem' that is being solved here. We had a
> stable ABI and we didnt break sysprof or powertop/latencytop in the
> past and wont break it in the future either.
I think the general consensus is that we didn't have a stable ABI, and
the tension between what kernel developers need for debugging, which
of necessity is kernel version specific, and what gets exposed as a
stable ABI that tools can depend upon, is not one that can be
resolved.
Regards,
- Ted
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 0/5] tracing/events: stable tracepoints
2010-11-17 0:53 [RFC][PATCH 0/5] tracing/events: stable tracepoints Steven Rostedt
` (4 preceding siblings ...)
2010-11-17 0:54 ` [RFC][PATCH 5/5] [PATCH 5/5] tracing/events: Add sched_migrate_task stable event Steven Rostedt
@ 2010-11-17 20:14 ` Mathieu Desnoyers
5 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2010-11-17 20:14 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Linus Torvalds, Theodore Tso,
Arjan van de Ven
* Steven Rostedt (rostedt@goodmis.org) wrote:
> We also have name (redundant), ID (should be agnostic), and print_fmt
> (lots of issues).
>
> So the new format looks like this:
>
> [root@bxf ~]# cat /sys/kernel/event/sched_switch/format
> array:prev_comm type:char size:8 count:16 align:1 signed:1;
> field:prev_pid type:pid_t size:32 align:4 signed:1;
> field:prev_state type:char size:8 align:1 signed:1;
> array:next_comm type:char size:8 count:16 align:1 signed:1;
> field:next_pid type:pid_t size:32 align:4 signed:1;
Hrm, this is mixing field and type definitions. How about we organize this in
something that will be both parseable and extensible ?
First, I don't see what exporting the kernel-internal type "pid_t" in there
gives you. Userspace knows nothing about this, so it seems pretty useless.
What do you think of this alternative layout ?
Named types below:
% cat /sys/kernel/event/types/char
parent = integer;
size = 8;
signed = true;
align = 8;
% cat /sys/kernel/event/types/pid_t
parent = integer;
size = 32;
signed = true;
align = 32; /* Or 8 if the architecture supports unaligned writes
efficiently */
% cat /sys/kernel/event/sched_switch/format
type { /* Nameless type */
parent = struct;
fields = {
{
type { /* Nameless type */
parent = array;
length = 16;
elem_type = char; /* refers to named type */
},
prev_comm,
},
{ pid_t, prev_pid, },
{ char, prev_state, },
{
type { /* Nameless type */
parent = array;
length = 16;
elem_type = char; /* refers to named type */
},
next_comm,
},
{ pid_t, next_pid, },
};
}
With this layout, we can declare types like enumerations, e.g.
% cat /sys/kernel/event/types/trap_id_t
type {
parent = enum;
size = 5; /* 5-bit bitfield to hold the enumeration */
signed = false;
align = 1; /* bit-packed */
map = {
{ 0, "divide error" },
{ 2, "nmi stack" },
{ 4, "overflow" },
....
};
}
So we can refer to this "named type" in all events for which we want to save
trap ID ? We therefore get the mapping to a human-understandable name for free.
> Some notes:
>
> o The size is in bits.
Yep, this will immensely help when dealing with bitfields.
> o We added an align, that is the natural alignment for the arch of that
> type.
Just watch out, in your initial example, I think your align field is in bytes
rather than bits. Ideally we'd like everything to be consistent.
Thanks,
Mathieu
> o We added an "array" type, that specifies the size of an element as
> well as a "count", where total size can be align(size) * count.
> o We separated the field name from the type.
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 17:52 ` Steven Rostedt
2010-11-17 18:12 ` Mathieu Desnoyers
@ 2010-11-17 23:48 ` Ted Ts'o
2010-11-18 13:05 ` Mathieu Desnoyers
1 sibling, 1 reply; 24+ messages in thread
From: Ted Ts'o @ 2010-11-17 23:48 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mathieu Desnoyers, Ingo Molnar, Greg KH, linux-kernel,
Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Frederic Weisbecker, Linus Torvalds, Arjan van de Ven, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra
On Wed, Nov 17, 2010 at 12:52:34PM -0500, Steven Rostedt wrote:
>
> I still say no to stable tracepoints in modules. Once you open that
> door, everyone will have it.
What about having KVM define the stable tracepoints in the built-in
part of the kernel, if CONFIG_KVM is Y or M, and then export the
tracepoints, such that the tracepoints can be called from a module?
That way the tracepoints aren't being *defined* in a module, they are
just being *called* from the module. Does that seem like a reasonable
compromise?
- Ted
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 18:12 ` Mathieu Desnoyers
@ 2010-11-18 9:42 ` Avi Kivity
0 siblings, 0 replies; 24+ messages in thread
From: Avi Kivity @ 2010-11-18 9:42 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Steven Rostedt, Ingo Molnar, Greg KH, linux-kernel, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
Linus Torvalds, Theodore Tso, Arjan van de Ven, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra
On 11/17/2010 08:12 PM, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> >
> > I still say no to stable tracepoints in modules. Once you open that
> > door, everyone will have it.
> >
> > But, that doesn't mean that a raw traepoint can't be stable. If the
> > maintainer of that tracepoint states it is stable, then by all means,
> > let tools use it.
>
> I'd really like to hear Avi's thoughts on this.
Steven's scheme is fine with me. (kvm tracepoints are more or less
stable, but all tools so far read the tracepoint definitions
dynamically, and can cope with tracepoints being added/changed/removed).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 23:48 ` Ted Ts'o
@ 2010-11-18 13:05 ` Mathieu Desnoyers
0 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2010-11-18 13:05 UTC (permalink / raw)
To: Ted Ts'o, Steven Rostedt, Ingo Molnar, Greg KH, linux-kernel,
Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Frederic Weisbecker, Linus Torvalds, Arjan van de Ven, Lin Ming,
Arnaldo Carvalho de Melo, Peter Zijlstra, Avi Kivity
* Ted Ts'o (tytso@mit.edu) wrote:
> On Wed, Nov 17, 2010 at 12:52:34PM -0500, Steven Rostedt wrote:
> >
> > I still say no to stable tracepoints in modules. Once you open that
> > door, everyone will have it.
>
> What about having KVM define the stable tracepoints in the built-in
> part of the kernel, if CONFIG_KVM is Y or M, and then export the
> tracepoints, such that the tracepoints can be called from a module?
>
> That way the tracepoints aren't being *defined* in a module, they are
> just being *called* from the module. Does that seem like a reasonable
> compromise?
As Avi replied separately, he does not seem to think stable ABI is needed for
KVM, given that his tools can deal with tracepoint addition/removal pretty well.
However, let's keep this idea in mind if we ever face this issue with other
"core" modules.
Thanks Ted,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem
2010-11-17 15:16 ` Peter Zijlstra
@ 2010-11-23 21:29 ` Steven Rostedt
0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2010-11-23 21:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Greg KH, linux-kernel, Andrew Morton,
Thomas Gleixner, Frederic Weisbecker, Linus Torvalds,
Theodore Tso, Arjan van de Ven, Mathieu Desnoyers, Lin Ming,
Arnaldo Carvalho de Melo
Not sure if the stable vs debug tracepoints debate is dead.
On Wed, 2010-11-17 at 16:16 +0100, Peter Zijlstra wrote:
>
> > Are these events now going to be labeled as stable?
>
> For now I'm concentrating on the hardware events, those are more or less
> stable in that if you run it on the same hardware you get the same thing
> counted. Different hardware may miss some events since it simply cannot
> provide anything resembling, or count something related but slightly
> different (speculative vs retired events, or different cache level, or a
> slightly different definition of miss etc..)
>
> Basically the same we already have with the perf 'generic' events.
>
> > Is every tracepoint
> > we have, much have the same data? Linus specifically said at Kernel
> > Summit that he wants absolutely NO modules to have a stable tracepoint.
> >
> > Also, if we just blindly label a tracepoint as "stable" then we must
> > keep all its contents. For example, the sched_switch will contain the
> > priority. As Peter has stated several times, that may go away. We also
> > do not want to lose getting that information, as a lot of us use it.
>
> For now I've not at all looked at representing tracepoints in sysfs, for
> the hardware bits I'm looking to place the event_source (formerly known
> as PMU) in the hardware topology already present in /sys/devices/.
>
> One suggestion was to place the software and tracepoint events
> in /sys/kernel/ some place. Another was to place driver specific, say
> wifi things near the wifi driver node.
Should we create a /sys/kernel/tracepoints/
directory for the stable tracepoints. And then have these directories
have the format of those tracepoints? Or is that against the "sysfs"
rule of only having a single line per file? The format is multi lines.
>
> None of the proposals have dealt with the stable vs debug thing, simply
> because none of them were post KS.
>
>
> > Some distros already mount debugfs by default. It's a oneliner in fstab.
>
> I haven't yet seen that.. maybe its automounted on /sys/kernel/debug/ or
> some daft place like that by magic initscipts outside of fstab but I'd
> not notice that.
I'd still like to keep the general tracepoints in something
like /sys/kernel/debug/events/... using the same format that we come up
with for stable tracepoints. The fact that you need to mount the debugfs
system to use it, should help keep some tools from using it.
>
> > >
> > > So putting it into sysfs looks like a pretty intelligent solution all around and i'd
> > > prefer it.
> >
> > Another downside is that you need to scan hundreds of directories to
> > find tracepoints. And again, are they all now stable?
>
> Not really, they'd be accessible through the bus structure, something
> like:
>
> /sys/bus/event_source/*/events/*
That's for every event? Or just software ones?
>
> Sure, that's more than 1 directory, but then so
> is /debug/tracing/events/*/*/
>
>
> > > Steve, would you be interested in helping out Lin Ming and PeterZ with the sysfs
> > > work - or at least help them come to the conclusion that we want eventfs?
> >
> > I don't think I would be much help with the former, and I'm thinking I'm
> > losing the later.
>
> Yeah, eventfs simply won't work for what we want to do with hardware
> events.
Well, hardware events are something that only depends on what you have
for hardware. I wasn't thinking of putting them in with the software
events. They probably should be separate.
Can the hardware events be traced? That is, not just profiled, but have
them traced for when they occur individually.
I'll go work on other things until we can come up with an agreement. I
hate to keep wasting days of work just to have someone NAK it again.
-- Steve
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2010-11-23 21:29 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-17 0:53 [RFC][PATCH 0/5] tracing/events: stable tracepoints Steven Rostedt
2010-11-17 0:53 ` [RFC][PATCH 1/5] [PATCH 1/5] events: Add EVENT_FS the event filesystem Steven Rostedt
2010-11-17 3:32 ` Greg KH
2010-11-17 10:39 ` Ingo Molnar
2010-11-17 12:25 ` Steven Rostedt
2010-11-17 15:03 ` Ingo Molnar
2010-11-17 15:16 ` Peter Zijlstra
2010-11-17 15:16 ` Steven Rostedt
2010-11-17 15:35 ` Peter Zijlstra
2010-11-17 18:42 ` Ted Ts'o
2010-11-17 15:16 ` Peter Zijlstra
2010-11-23 21:29 ` Steven Rostedt
2010-11-17 17:46 ` Mathieu Desnoyers
2010-11-17 17:52 ` Steven Rostedt
2010-11-17 18:12 ` Mathieu Desnoyers
2010-11-18 9:42 ` Avi Kivity
2010-11-17 23:48 ` Ted Ts'o
2010-11-18 13:05 ` Mathieu Desnoyers
2010-11-17 12:16 ` Steven Rostedt
2010-11-17 0:53 ` [RFC][PATCH 2/5] [PATCH 2/5] tracing/events: Add code to (un)register stable events Steven Rostedt
2010-11-17 0:54 ` [RFC][PATCH 3/5] [PATCH 3/5] tracing/events: Add infrastructure to show stable event formats Steven Rostedt
2010-11-17 0:54 ` [RFC][PATCH 4/5] [PATCH 4/5] tracing/events: Add stable event sched_switch Steven Rostedt
2010-11-17 0:54 ` [RFC][PATCH 5/5] [PATCH 5/5] tracing/events: Add sched_migrate_task stable event Steven Rostedt
2010-11-17 20:14 ` [RFC][PATCH 0/5] tracing/events: stable tracepoints Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).