* [PATCH 0/4] [GIT PULL] for tip/tracing/filters
@ 2009-04-02 5:27 Steven Rostedt
2009-04-02 5:27 ` [PATCH 1/4] tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events Steven Rostedt
` (6 more replies)
0 siblings, 7 replies; 22+ messages in thread
From: Steven Rostedt @ 2009-04-02 5:27 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker
Ingo,
Please pull the latest tip/tracing/filters tree, which can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/filters
Steven Rostedt (2):
ring-buffer: add ring_buffer_discard_commit
tracing/filters: use ring_buffer_discard_commit for discarded events
Tom Zanussi (2):
tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events
tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro
----
include/linux/ring_buffer.h | 29 ++++++++
kernel/trace/kmemtrace.c | 6 ++
kernel/trace/ring_buffer.c | 125 +++++++++++++++++++++++++++++------
kernel/trace/trace.c | 32 ++++++++-
kernel/trace/trace.h | 23 +++++++
kernel/trace/trace_branch.c | 3 +
kernel/trace/trace_event_types.h | 8 ++-
kernel/trace/trace_events.c | 7 ++
kernel/trace/trace_events_filter.c | 4 +-
kernel/trace/trace_events_stage_2.h | 7 --
kernel/trace/trace_events_stage_3.h | 6 +-
kernel/trace/trace_export.c | 90 ++++++++++++++++++++++++-
kernel/trace/trace_hw_branches.c | 2 +
kernel/trace/trace_power.c | 4 +
14 files changed, 305 insertions(+), 41 deletions(-)
--
--
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/4] tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events
2009-04-02 5:27 [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
@ 2009-04-02 5:27 ` Steven Rostedt
2009-04-02 12:17 ` Frederic Weisbecker
2009-04-02 5:27 ` Steven Rostedt
` (5 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: Steven Rostedt @ 2009-04-02 5:27 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker, Steven Rostedt
[-- Attachment #1: 0001-tracing-filters-add-run-time-field-descriptions-to.patch --]
[-- Type: text/plain, Size: 19025 bytes --]
From: Tom Zanussi <tzanussi@gmail.com>
This patch adds run-time field descriptions to all the event formats
exported using TRACE_EVENT_FORMAT. It also hooks up all the tracers
that use them (i.e. the tracers in the 'ftrace subsystem') so they can
also have their output filtered by the event-filtering mechanism.
When I was testing this, there were a couple of things that fooled me
into thinking the filters weren't working, when actually they were -
I'll mention them here so others don't make the same mistakes (and file
bug reports. ;-)
One is that some of the tracers trace multiple events e.g. the
sched_switch tracer uses the context_switch and wakeup events, and if
you don't set filters on all of the traced events, the unfiltered output
from the events without filters on them can make it look like the
filtering as a whole isn't working properly, when actually it is doing
what it was asked to do - it just wasn't asked to do the right thing.
The other is that for the really high-volume tracers e.g. the function
tracer, the volume of filtered events can be so high that it pushes the
unfiltered events out of the ring buffer before they can be read so e.g.
cat'ing the trace file repeatedly shows either no output, or once in
awhile some output but that isn't there the next time you read the
trace, which isn't what you normally expect when reading the trace file.
If you read from the trace_pipe file though, you can catch them before
they disappear.
Changes from v1:
As suggested by Frederic Weisbecker:
- get rid of externs in functions
- added unlikely() to filter_check_discard()
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
kernel/trace/kmemtrace.c | 6 ++++
kernel/trace/trace.c | 25 +++++++++++++++
kernel/trace/trace.h | 20 ++++++++++++
kernel/trace/trace_branch.c | 3 ++
kernel/trace/trace_event_types.h | 6 ++-
kernel/trace/trace_events.c | 7 ++++
kernel/trace/trace_events_filter.c | 4 +-
kernel/trace/trace_events_stage_2.h | 7 ----
kernel/trace/trace_export.c | 57 +++++++++++++++++++++++++++++++++--
kernel/trace/trace_hw_branches.c | 2 +
kernel/trace/trace_power.c | 4 ++
11 files changed, 127 insertions(+), 14 deletions(-)
diff --git a/kernel/trace/kmemtrace.c b/kernel/trace/kmemtrace.c
index 5011f4d..65aba48 100644
--- a/kernel/trace/kmemtrace.c
+++ b/kernel/trace/kmemtrace.c
@@ -42,6 +42,7 @@ static inline void kmemtrace_alloc(enum kmemtrace_type_id type_id,
gfp_t gfp_flags,
int node)
{
+ struct ftrace_event_call *call = &event_kmem_alloc;
struct trace_array *tr = kmemtrace_array;
struct kmemtrace_alloc_entry *entry;
struct ring_buffer_event *event;
@@ -62,6 +63,8 @@ static inline void kmemtrace_alloc(enum kmemtrace_type_id type_id,
entry->gfp_flags = gfp_flags;
entry->node = node;
+ filter_check_discard(call, entry, event);
+
ring_buffer_unlock_commit(tr->buffer, event);
trace_wake_up();
@@ -71,6 +74,7 @@ static inline void kmemtrace_free(enum kmemtrace_type_id type_id,
unsigned long call_site,
const void *ptr)
{
+ struct ftrace_event_call *call = &event_kmem_free;
struct trace_array *tr = kmemtrace_array;
struct kmemtrace_free_entry *entry;
struct ring_buffer_event *event;
@@ -86,6 +90,8 @@ static inline void kmemtrace_free(enum kmemtrace_type_id type_id,
entry->call_site = call_site;
entry->ptr = ptr;
+ filter_check_discard(call, entry, event);
+
ring_buffer_unlock_commit(tr->buffer, event);
trace_wake_up();
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a0174a4..9a10af0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -907,6 +907,7 @@ trace_function(struct trace_array *tr,
unsigned long ip, unsigned long parent_ip, unsigned long flags,
int pc)
{
+ struct ftrace_event_call *call = &event_function;
struct ring_buffer_event *event;
struct ftrace_entry *entry;
@@ -921,6 +922,9 @@ trace_function(struct trace_array *tr,
entry = ring_buffer_event_data(event);
entry->ip = ip;
entry->parent_ip = parent_ip;
+
+ filter_check_discard(call, entry, event);
+
ring_buffer_unlock_commit(tr->buffer, event);
}
@@ -930,6 +934,7 @@ static int __trace_graph_entry(struct trace_array *tr,
unsigned long flags,
int pc)
{
+ struct ftrace_event_call *call = &event_funcgraph_entry;
struct ring_buffer_event *event;
struct ftrace_graph_ent_entry *entry;
@@ -942,6 +947,7 @@ static int __trace_graph_entry(struct trace_array *tr,
return 0;
entry = ring_buffer_event_data(event);
entry->graph_ent = *trace;
+ filter_check_discard(call, entry, event);
ring_buffer_unlock_commit(global_trace.buffer, event);
return 1;
@@ -952,6 +958,7 @@ static void __trace_graph_return(struct trace_array *tr,
unsigned long flags,
int pc)
{
+ struct ftrace_event_call *call = &event_funcgraph_exit;
struct ring_buffer_event *event;
struct ftrace_graph_ret_entry *entry;
@@ -964,6 +971,7 @@ static void __trace_graph_return(struct trace_array *tr,
return;
entry = ring_buffer_event_data(event);
entry->ret = *trace;
+ filter_check_discard(call, entry, event);
ring_buffer_unlock_commit(global_trace.buffer, event);
}
#endif
@@ -982,6 +990,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
int skip, int pc)
{
#ifdef CONFIG_STACKTRACE
+ struct ftrace_event_call *call = &event_kernel_stack;
struct ring_buffer_event *event;
struct stack_entry *entry;
struct stack_trace trace;
@@ -999,6 +1008,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
trace.entries = entry->caller;
save_stack_trace(&trace);
+ filter_check_discard(call, entry, event);
ring_buffer_unlock_commit(tr->buffer, event);
#endif
}
@@ -1024,6 +1034,7 @@ static void ftrace_trace_userstack(struct trace_array *tr,
unsigned long flags, int pc)
{
#ifdef CONFIG_STACKTRACE
+ struct ftrace_event_call *call = &event_user_stack;
struct ring_buffer_event *event;
struct userstack_entry *entry;
struct stack_trace trace;
@@ -1045,6 +1056,7 @@ static void ftrace_trace_userstack(struct trace_array *tr,
trace.entries = entry->caller;
save_stack_trace_user(&trace);
+ filter_check_discard(call, entry, event);
ring_buffer_unlock_commit(tr->buffer, event);
#endif
}
@@ -1061,6 +1073,7 @@ ftrace_trace_special(void *__tr,
unsigned long arg1, unsigned long arg2, unsigned long arg3,
int pc)
{
+ struct ftrace_event_call *call = &event_special;
struct ring_buffer_event *event;
struct trace_array *tr = __tr;
struct special_entry *entry;
@@ -1073,6 +1086,7 @@ ftrace_trace_special(void *__tr,
entry->arg1 = arg1;
entry->arg2 = arg2;
entry->arg3 = arg3;
+ filter_check_discard(call, entry, event);
trace_buffer_unlock_commit(tr, event, 0, pc);
}
@@ -1089,6 +1103,7 @@ tracing_sched_switch_trace(struct trace_array *tr,
struct task_struct *next,
unsigned long flags, int pc)
{
+ struct ftrace_event_call *call = &event_context_switch;
struct ring_buffer_event *event;
struct ctx_switch_entry *entry;
@@ -1104,6 +1119,9 @@ tracing_sched_switch_trace(struct trace_array *tr,
entry->next_prio = next->prio;
entry->next_state = next->state;
entry->next_cpu = task_cpu(next);
+
+ filter_check_discard(call, entry, event);
+
trace_buffer_unlock_commit(tr, event, flags, pc);
}
@@ -1113,6 +1131,7 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
struct task_struct *curr,
unsigned long flags, int pc)
{
+ struct ftrace_event_call *call = &event_wakeup;
struct ring_buffer_event *event;
struct ctx_switch_entry *entry;
@@ -1129,6 +1148,8 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
entry->next_state = wakee->state;
entry->next_cpu = task_cpu(wakee);
+ filter_check_discard(call, entry, event);
+
ring_buffer_unlock_commit(tr->buffer, event);
ftrace_trace_stack(tr, flags, 6, pc);
ftrace_trace_userstack(tr, flags, pc);
@@ -1230,6 +1251,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
(raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
static u32 trace_buf[TRACE_BUF_SIZE];
+ struct ftrace_event_call *call = &event_bprint;
struct ring_buffer_event *event;
struct trace_array *tr = &global_trace;
struct trace_array_cpu *data;
@@ -1269,6 +1291,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
entry->fmt = fmt;
memcpy(entry->buf, trace_buf, sizeof(u32) * len);
+ filter_check_discard(call, entry, event);
ring_buffer_unlock_commit(tr->buffer, event);
out_unlock:
@@ -1288,6 +1311,7 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
static raw_spinlock_t trace_buf_lock = __RAW_SPIN_LOCK_UNLOCKED;
static char trace_buf[TRACE_BUF_SIZE];
+ struct ftrace_event_call *call = &event_print;
struct ring_buffer_event *event;
struct trace_array *tr = &global_trace;
struct trace_array_cpu *data;
@@ -1323,6 +1347,7 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
memcpy(&entry->buf, trace_buf, len);
entry->buf[len] = 0;
+ filter_check_discard(call, entry, event);
ring_buffer_unlock_commit(tr->buffer, event);
out_unlock:
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index cbc168f..e1615d9 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -858,6 +858,21 @@ extern void filter_free_subsystem_preds(struct event_subsystem *system);
extern int filter_add_subsystem_pred(struct event_subsystem *system,
struct filter_pred *pred);
+static inline void
+filter_check_discard(struct ftrace_event_call *call, void *rec,
+ struct ring_buffer_event *event)
+{
+ if (unlikely(call->preds) && !filter_match_preds(call, rec))
+ ring_buffer_event_discard(event);
+}
+
+#define __common_field(type, item) \
+ ret = trace_define_field(event_call, #type, "common_" #item, \
+ offsetof(typeof(field.ent), item), \
+ sizeof(field.ent.item)); \
+ if (ret) \
+ return ret;
+
void event_trace_printk(unsigned long ip, const char *fmt, ...);
extern struct ftrace_event_call __start_ftrace_events[];
extern struct ftrace_event_call __stop_ftrace_events[];
@@ -889,4 +904,9 @@ do { \
__trace_printk(ip, fmt, ##args); \
} while (0)
+#undef TRACE_EVENT_FORMAT
+#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
+ extern struct ftrace_event_call event_##call;
+#include "trace_event_types.h"
+
#endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
index ad8c22e..25e618a 100644
--- a/kernel/trace/trace_branch.c
+++ b/kernel/trace/trace_branch.c
@@ -30,6 +30,7 @@ static struct trace_array *branch_tracer;
static void
probe_likely_condition(struct ftrace_branch_data *f, int val, int expect)
{
+ struct ftrace_event_call *call = &event_branch;
struct trace_array *tr = branch_tracer;
struct ring_buffer_event *event;
struct trace_branch *entry;
@@ -73,6 +74,8 @@ probe_likely_condition(struct ftrace_branch_data *f, int val, int expect)
entry->line = f->line;
entry->correct = val == expect;
+ filter_check_discard(call, entry, event);
+
ring_buffer_unlock_commit(tr->buffer, event);
out:
diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index fd78bee..95b147a 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -122,8 +122,10 @@ TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore,
TRACE_EVENT_FORMAT(branch, TRACE_BRANCH, trace_branch, ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned int, line, line)
- TRACE_FIELD_SPECIAL(char func[TRACE_FUNC_SIZE+1], func, func)
- TRACE_FIELD_SPECIAL(char file[TRACE_FUNC_SIZE+1], file, file)
+ TRACE_FIELD_SPECIAL(char func[TRACE_FUNC_SIZE+1], func,
+ TRACE_FUNC_SIZE+1, func)
+ TRACE_FIELD_SPECIAL(char file[TRACE_FUNC_SIZE+1], file,
+ TRACE_FUNC_SIZE+1, file)
TRACE_FIELD(char, correct, correct)
),
TP_RAW_FMT("%u:%s:%s (%u)")
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 64ec4d2..be9299a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -680,6 +680,7 @@ static struct dentry *
event_subsystem_dir(const char *name, struct dentry *d_events)
{
struct event_subsystem *system;
+ struct dentry *entry;
/* First see if we did not already create this dir */
list_for_each_entry(system, &event_subsystems, list) {
@@ -708,6 +709,12 @@ event_subsystem_dir(const char *name, struct dentry *d_events)
system->preds = NULL;
+ entry = debugfs_create_file("filter", 0644, system->entry, system,
+ &ftrace_subsystem_filter_fops);
+ if (!entry)
+ pr_warning("Could not create debugfs "
+ "'%s/filter' entry\n", name);
+
return system->entry;
}
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 026be41..470ad94 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -185,7 +185,7 @@ void filter_free_subsystem_preds(struct event_subsystem *system)
}
events_for_each(call) {
- if (!call->name || !call->regfunc)
+ if (!call->define_fields)
continue;
if (!strcmp(call->system, system->name))
@@ -324,7 +324,7 @@ int filter_add_subsystem_pred(struct event_subsystem *system,
events_for_each(call) {
int err;
- if (!call->name || !call->regfunc)
+ if (!call->define_fields)
continue;
if (strcmp(call->system, system->name))
diff --git a/kernel/trace/trace_events_stage_2.h b/kernel/trace/trace_events_stage_2.h
index 30743f7..1c94b87 100644
--- a/kernel/trace/trace_events_stage_2.h
+++ b/kernel/trace/trace_events_stage_2.h
@@ -146,13 +146,6 @@ ftrace_format_##call(struct trace_seq *s) \
if (ret) \
return ret;
-#define __common_field(type, item) \
- ret = trace_define_field(event_call, #type, "common_" #item, \
- offsetof(typeof(field.ent), item), \
- sizeof(field.ent.item)); \
- if (ret) \
- return ret;
-
#undef TRACE_EVENT
#define TRACE_EVENT(call, proto, args, tstruct, func, print) \
int \
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 4d9952d..47989be 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -30,7 +30,7 @@
#undef TRACE_FIELD_SPECIAL
-#define TRACE_FIELD_SPECIAL(type_item, item, cmd) \
+#define TRACE_FIELD_SPECIAL(type_item, item, len, cmd) \
ret = trace_seq_printf(s, "\tfield special:" #type_item ";\t" \
"offset:%u;\tsize:%u;\n", \
(unsigned int)offsetof(typeof(field), item), \
@@ -85,18 +85,69 @@ ftrace_format_##call(struct trace_seq *s) \
#define TRACE_ENTRY entry
#undef TRACE_FIELD_SPECIAL
-#define TRACE_FIELD_SPECIAL(type_item, item, cmd) \
+#define TRACE_FIELD_SPECIAL(type_item, item, len, cmd) \
cmd;
#undef TRACE_EVENT_FORMAT
#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
+int ftrace_define_fields_##call(void); \
+static int ftrace_raw_init_event_##call(void); \
\
-static struct ftrace_event_call __used \
+struct ftrace_event_call __used \
__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) event_##call = { \
.name = #call, \
.id = proto, \
.system = __stringify(TRACE_SYSTEM), \
+ .raw_init = ftrace_raw_init_event_##call, \
.show_format = ftrace_format_##call, \
+ .define_fields = ftrace_define_fields_##call, \
+}; \
+static int ftrace_raw_init_event_##call(void) \
+{ \
+ INIT_LIST_HEAD(&event_##call.fields); \
+ return 0; \
+} \
+
+#include "trace_event_types.h"
+
+#undef TRACE_FIELD
+#define TRACE_FIELD(type, item, assign) \
+ ret = trace_define_field(event_call, #type, #item, \
+ offsetof(typeof(field), item), \
+ sizeof(field.item)); \
+ if (ret) \
+ return ret;
+
+#undef TRACE_FIELD_SPECIAL
+#define TRACE_FIELD_SPECIAL(type, item, len, cmd) \
+ ret = trace_define_field(event_call, #type "[" #len "]", #item, \
+ offsetof(typeof(field), item), \
+ sizeof(field.item)); \
+ if (ret) \
+ return ret;
+
+#undef TRACE_FIELD_ZERO_CHAR
+#define TRACE_FIELD_ZERO_CHAR(item)
+
+#undef TRACE_EVENT_FORMAT
+#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
+int \
+ftrace_define_fields_##call(void) \
+{ \
+ struct ftrace_event_call *event_call = &event_##call; \
+ struct args field; \
+ int ret; \
+ \
+ __common_field(unsigned char, type); \
+ __common_field(unsigned char, flags); \
+ __common_field(unsigned char, preempt_count); \
+ __common_field(int, pid); \
+ __common_field(int, tgid); \
+ \
+ tstruct; \
+ \
+ return ret; \
}
+
#include "trace_event_types.h"
diff --git a/kernel/trace/trace_hw_branches.c b/kernel/trace/trace_hw_branches.c
index 7bfdf4c..e6b275b 100644
--- a/kernel/trace/trace_hw_branches.c
+++ b/kernel/trace/trace_hw_branches.c
@@ -168,6 +168,7 @@ static enum print_line_t bts_trace_print_line(struct trace_iterator *iter)
void trace_hw_branch(u64 from, u64 to)
{
+ struct ftrace_event_call *call = &event_hw_branch;
struct trace_array *tr = hw_branch_trace;
struct ring_buffer_event *event;
struct hw_branch_entry *entry;
@@ -194,6 +195,7 @@ void trace_hw_branch(u64 from, u64 to)
entry->ent.type = TRACE_HW_BRANCHES;
entry->from = from;
entry->to = to;
+ filter_check_discard(call, entry, event);
trace_buffer_unlock_commit(tr, event, 0, 0);
out:
diff --git a/kernel/trace/trace_power.c b/kernel/trace/trace_power.c
index bae791e..8ce7d7d 100644
--- a/kernel/trace/trace_power.c
+++ b/kernel/trace/trace_power.c
@@ -36,6 +36,7 @@ static void probe_power_start(struct power_trace *it, unsigned int type,
static void probe_power_end(struct power_trace *it)
{
+ struct ftrace_event_call *call = &event_power;
struct ring_buffer_event *event;
struct trace_power *entry;
struct trace_array_cpu *data;
@@ -54,6 +55,7 @@ static void probe_power_end(struct power_trace *it)
goto out;
entry = ring_buffer_event_data(event);
entry->state_data = *it;
+ filter_check_discard(call, entry, event);
trace_buffer_unlock_commit(tr, event, 0, 0);
out:
preempt_enable();
@@ -62,6 +64,7 @@ static void probe_power_end(struct power_trace *it)
static void probe_power_mark(struct power_trace *it, unsigned int type,
unsigned int level)
{
+ struct ftrace_event_call *call = &event_power;
struct ring_buffer_event *event;
struct trace_power *entry;
struct trace_array_cpu *data;
@@ -84,6 +87,7 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
goto out;
entry = ring_buffer_event_data(event);
entry->state_data = *it;
+ filter_check_discard(call, entry, event);
trace_buffer_unlock_commit(tr, event, 0, 0);
out:
preempt_enable();
--
1.6.2.1
--
^ permalink raw reply related [flat|nested] 22+ messages in thread
* (no subject)
2009-04-02 5:27 [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
2009-04-02 5:27 ` [PATCH 1/4] tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events Steven Rostedt
@ 2009-04-02 5:27 ` Steven Rostedt
2009-04-02 5:27 ` [PATCH 2/4] tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro Steven Rostedt
` (4 subsequent siblings)
6 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2009-04-02 5:27 UTC (permalink / raw)
To: linux-kernel
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/4] tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro
2009-04-02 5:27 [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
2009-04-02 5:27 ` [PATCH 1/4] tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events Steven Rostedt
2009-04-02 5:27 ` Steven Rostedt
@ 2009-04-02 5:27 ` Steven Rostedt
2009-04-02 5:27 ` [PATCH 3/4] ring-buffer: add ring_buffer_discard_commit Steven Rostedt
` (3 subsequent siblings)
6 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2009-04-02 5:27 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker, Steven Rostedt
[-- Attachment #1: 0002-tracing-filters-add-TRACE_EVENT_FORMAT_NOFILTER-eve.patch --]
[-- Type: text/plain, Size: 3976 bytes --]
From: Tom Zanussi <tzanussi@gmail.com>
Frederic Weisbecker suggested that the trace_special event shouldn't be
filterable; this patch adds a TRACE_EVENT_FORMAT_NOFILTER event macro
that allows an event format to be exported without having a filter
attached, and removes filtering from the trace_special event.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
kernel/trace/trace.c | 2 --
kernel/trace/trace.h | 2 ++
kernel/trace/trace_event_types.h | 2 +-
kernel/trace/trace_export.c | 33 +++++++++++++++++++++++++++++++++
4 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 9a10af0..8c27ac4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1073,7 +1073,6 @@ ftrace_trace_special(void *__tr,
unsigned long arg1, unsigned long arg2, unsigned long arg3,
int pc)
{
- struct ftrace_event_call *call = &event_special;
struct ring_buffer_event *event;
struct trace_array *tr = __tr;
struct special_entry *entry;
@@ -1086,7 +1085,6 @@ ftrace_trace_special(void *__tr,
entry->arg1 = arg1;
entry->arg2 = arg2;
entry->arg3 = arg3;
- filter_check_discard(call, entry, event);
trace_buffer_unlock_commit(tr, event, 0, pc);
}
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e1615d9..12316e3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -907,6 +907,8 @@ do { \
#undef TRACE_EVENT_FORMAT
#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
extern struct ftrace_event_call event_##call;
+#undef TRACE_EVENT_FORMAT_NOFILTER
+#define TRACE_EVENT_FORMAT_NOFILTER(call, proto, args, fmt, tstruct, tpfmt)
#include "trace_event_types.h"
#endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index 95b147a..cfcecc4 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -57,7 +57,7 @@ TRACE_EVENT_FORMAT(context_switch, TRACE_CTX, ctx_switch_entry, ignore,
TP_RAW_FMT("%u:%u:%u ==+ %u:%u:%u [%03u]")
);
-TRACE_EVENT_FORMAT(special, TRACE_SPECIAL, special_entry, ignore,
+TRACE_EVENT_FORMAT_NOFILTER(special, TRACE_SPECIAL, special_entry, ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned long, arg1, arg1)
TRACE_FIELD(unsigned long, arg2, arg2)
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 47989be..a16a29d 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -65,6 +65,22 @@ ftrace_format_##call(struct trace_seq *s) \
return ret; \
}
+#undef TRACE_EVENT_FORMAT_NOFILTER
+#define TRACE_EVENT_FORMAT_NOFILTER(call, proto, args, fmt, tstruct, \
+ tpfmt) \
+static int \
+ftrace_format_##call(struct trace_seq *s) \
+{ \
+ struct args field; \
+ int ret; \
+ \
+ tstruct; \
+ \
+ trace_seq_printf(s, "\nprint fmt: \"%s\"\n", tpfmt); \
+ \
+ return ret; \
+}
+
#include "trace_event_types.h"
#undef TRACE_ZERO_CHAR
@@ -109,6 +125,19 @@ static int ftrace_raw_init_event_##call(void) \
return 0; \
} \
+#undef TRACE_EVENT_FORMAT_NOFILTER
+#define TRACE_EVENT_FORMAT_NOFILTER(call, proto, args, fmt, tstruct, \
+ tpfmt) \
+ \
+struct ftrace_event_call __used \
+__attribute__((__aligned__(4))) \
+__attribute__((section("_ftrace_events"))) event_##call = { \
+ .name = #call, \
+ .id = proto, \
+ .system = __stringify(TRACE_SYSTEM), \
+ .show_format = ftrace_format_##call, \
+};
+
#include "trace_event_types.h"
#undef TRACE_FIELD
@@ -150,4 +179,8 @@ ftrace_define_fields_##call(void) \
return ret; \
}
+#undef TRACE_EVENT_FORMAT_NOFILTER
+#define TRACE_EVENT_FORMAT_NOFILTER(call, proto, args, fmt, tstruct, \
+ tpfmt)
+
#include "trace_event_types.h"
--
1.6.2.1
--
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/4] ring-buffer: add ring_buffer_discard_commit
2009-04-02 5:27 [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
` (2 preceding siblings ...)
2009-04-02 5:27 ` [PATCH 2/4] tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro Steven Rostedt
@ 2009-04-02 5:27 ` Steven Rostedt
2009-04-02 5:48 ` Andrew Morton
2009-04-02 5:27 ` [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events Steven Rostedt
` (2 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: Steven Rostedt @ 2009-04-02 5:27 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker, Steven Rostedt
[-- Attachment #1: 0003-ring-buffer-add-ring_buffer_discard_commit.patch --]
[-- Type: text/plain, Size: 7097 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The ring_buffer_discard_commit is similar to ring_buffer_event_discard
but it can only be done on an event that has yet to be commited.
Unpredictable results can happen otherwise.
The main difference between ring_buffer_discard_commit and
ring_buffer_event_discard is that ring_buffer_discard_commit will try
to free the data in the ring buffer if nothing has addded data
after the reserved event. If something did, then it acts almost the
same as ring_buffer_event_discard followed by a
ring_buffer_unlock_commit.
Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
can be called on an event, not both.
This commit also exports both discard functions to be usable by
GPL modules.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
include/linux/ring_buffer.h | 29 ++++++++++
kernel/trace/ring_buffer.c | 125 +++++++++++++++++++++++++++++++++++-------
2 files changed, 133 insertions(+), 21 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index e1b7b21..f0aa486 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -68,9 +68,38 @@ ring_buffer_event_time_delta(struct ring_buffer_event *event)
return event->time_delta;
}
+/*
+ * ring_buffer_event_discard can discard any event in the ring buffer.
+ * it is up to the caller to protect against a reader from
+ * consuming it or a writer from wrapping and replacing it.
+ *
+ * No external protection is needed if this is called before
+ * the event is commited. But in that case it would be better to
+ * use ring_buffer_discard_commit.
+ *
+ * Note, if an event that has not been committed is discarded
+ * with ring_buffer_event_discard, it must still be committed.
+ */
void ring_buffer_event_discard(struct ring_buffer_event *event);
/*
+ * ring_buffer_discard_commit will remove an event that has not
+ * ben committed yet. If this is used, then ring_buffer_unlock_commit
+ * must not be called on the discarded event. This function
+ * will try to remove the event from the ring buffer completely
+ * if another event has not been written after it.
+ *
+ * Example use:
+ *
+ * if (some_condition)
+ * ring_buffer_discard_commit(buffer, event);
+ * else
+ * ring_buffer_unlock_commit(buffer, event);
+ */
+void ring_buffer_discard_commit(struct ring_buffer *buffer,
+ struct ring_buffer_event *event);
+
+/*
* size is in bytes for each per CPU buffer.
*/
struct ring_buffer *
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index edce2ff..4d0ff54 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -205,27 +205,6 @@ static void rb_event_set_padding(struct ring_buffer_event *event)
event->time_delta = 0;
}
-/**
- * ring_buffer_event_discard - discard an event in the ring buffer
- * @buffer: the ring buffer
- * @event: the event to discard
- *
- * Sometimes a event that is in the ring buffer needs to be ignored.
- * This function lets the user discard an event in the ring buffer
- * and then that event will not be read later.
- *
- * Note, it is up to the user to be careful with this, and protect
- * against races. If the user discards an event that has been consumed
- * it is possible that it could corrupt the ring buffer.
- */
-void ring_buffer_event_discard(struct ring_buffer_event *event)
-{
- event->type = RINGBUF_TYPE_PADDING;
- /* time delta must be non zero */
- if (!event->time_delta)
- event->time_delta = 1;
-}
-
static unsigned
rb_event_data_length(struct ring_buffer_event *event)
{
@@ -1572,6 +1551,110 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
EXPORT_SYMBOL_GPL(ring_buffer_unlock_commit);
/**
+ * ring_buffer_event_discard - discard any event in the ring buffer
+ * @event: the event to discard
+ *
+ * Sometimes a event that is in the ring buffer needs to be ignored.
+ * This function lets the user discard an event in the ring buffer
+ * and then that event will not be read later.
+ *
+ * Note, it is up to the user to be careful with this, and protect
+ * against races. If the user discards an event that has been consumed
+ * it is possible that it could corrupt the ring buffer.
+ */
+void ring_buffer_event_discard(struct ring_buffer_event *event)
+{
+ event->type = RINGBUF_TYPE_PADDING;
+ /* time delta must be non zero */
+ if (!event->time_delta)
+ event->time_delta = 1;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_event_discard);
+
+/**
+ * ring_buffer_commit_discard - discard an event that has not been committed
+ * @buffer: the ring buffer
+ * @event: non committed event to discard
+ *
+ * This is similar to ring_buffer_event_discard but must only be
+ * performed on an event that has not been committed yet. The difference
+ * is that this will also try to free the event from the ring buffer
+ * if another event has not been added behind it.
+ *
+ * If another event has been added behind it, it will set the event
+ * up as discarded, and perform the commit.
+ *
+ * If this function is called, do not call ring_buffer_unlock_commit on
+ * the event.
+ */
+void ring_buffer_discard_commit(struct ring_buffer *buffer,
+ struct ring_buffer_event *event)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ unsigned long new_index, old_index;
+ struct buffer_page *bpage;
+ unsigned long index;
+ unsigned long addr;
+ int cpu;
+
+ /* The event is discarded regardless */
+ ring_buffer_event_discard(event);
+
+ /*
+ * This must only be called if the event has not been
+ * committed yet. Thus we can assume that preemption
+ * is still disabled.
+ */
+ RB_WARN_ON(buffer, !preempt_count());
+
+ cpu = smp_processor_id();
+ cpu_buffer = buffer->buffers[cpu];
+
+ new_index = rb_event_index(event);
+ old_index = new_index + rb_event_length(event);
+ addr = (unsigned long)event;
+ addr &= PAGE_MASK;
+
+ bpage = cpu_buffer->tail_page;
+
+ if (bpage == (void *)addr && rb_page_write(bpage) == old_index) {
+ /*
+ * This is on the tail page. It is possible that
+ * a write could come in and move the tail page
+ * and write to the next page. That is fine
+ * because we just shorten what is on this page.
+ */
+ index = local_cmpxchg(&bpage->write, old_index, new_index);
+ if (index == old_index)
+ goto out;
+ }
+
+ /*
+ * The commit is still visible by the reader, so we
+ * must increment entries.
+ */
+ cpu_buffer->entries++;
+ out:
+ /*
+ * If a write came in and pushed the tail page
+ * we still need to update the commit pointer
+ * if we were the commit.
+ */
+ if (rb_is_commit(cpu_buffer, event))
+ rb_set_commit_to_write(cpu_buffer);
+
+ /*
+ * Only the last preempt count needs to restore preemption.
+ */
+ if (preempt_count() == 1)
+ ftrace_preempt_enable(per_cpu(rb_need_resched, cpu));
+ else
+ preempt_enable_no_resched_notrace();
+
+}
+EXPORT_SYMBOL_GPL(ring_buffer_discard_commit);
+
+/**
* ring_buffer_write - write data to the buffer without reserving
* @buffer: The ring buffer to write to.
* @length: The length of the data being written (excluding the event header)
--
1.6.2.1
--
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events
2009-04-02 5:27 [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
` (3 preceding siblings ...)
2009-04-02 5:27 ` [PATCH 3/4] ring-buffer: add ring_buffer_discard_commit Steven Rostedt
@ 2009-04-02 5:27 ` Steven Rostedt
2009-04-02 9:06 ` Tom Zanussi
2009-04-02 5:37 ` [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
2009-04-03 12:14 ` Ingo Molnar
6 siblings, 1 reply; 22+ messages in thread
From: Steven Rostedt @ 2009-04-02 5:27 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker, Steven Rostedt
[-- Attachment #1: 0004-tracing-filters-use-ring_buffer_discard_commit-for.patch --]
[-- Type: text/plain, Size: 2696 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The ring_buffer_discard_commit makes better usage of the ring_buffer
when an event has been discarded. It tries to remove it completely if
possible.
This patch converts the trace event filtering to use
ring_buffer_discard_commit instead of the ring_buffer_event_discard.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
kernel/trace/trace.c | 9 +++++++--
kernel/trace/trace.h | 1 +
kernel/trace/trace_events_stage_3.h | 6 +++---
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8c27ac4..d89489c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -893,13 +893,18 @@ trace_current_buffer_lock_reserve(unsigned char type, unsigned long len,
void trace_current_buffer_unlock_commit(struct ring_buffer_event *event,
unsigned long flags, int pc)
{
- return __trace_buffer_unlock_commit(&global_trace, event, flags, pc, 1);
+ __trace_buffer_unlock_commit(&global_trace, event, flags, pc, 1);
}
void trace_nowake_buffer_unlock_commit(struct ring_buffer_event *event,
unsigned long flags, int pc)
{
- return __trace_buffer_unlock_commit(&global_trace, event, flags, pc, 0);
+ __trace_buffer_unlock_commit(&global_trace, event, flags, pc, 0);
+}
+
+void trace_current_buffer_discard_commit(struct ring_buffer_event *event)
+{
+ ring_buffer_discard_commit(global_trace.buffer, event);
}
void
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 12316e3..c0d7a12 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -491,6 +491,7 @@ void trace_current_buffer_unlock_commit(struct ring_buffer_event *event,
unsigned long flags, int pc);
void trace_nowake_buffer_unlock_commit(struct ring_buffer_event *event,
unsigned long flags, int pc);
+void trace_current_buffer_discard_commit(struct ring_buffer_event *event);
struct trace_entry *tracing_get_trace_entry(struct trace_array *tr,
struct trace_array_cpu *data);
diff --git a/kernel/trace/trace_events_stage_3.h b/kernel/trace/trace_events_stage_3.h
index 9d2fa78..d2f34bf 100644
--- a/kernel/trace/trace_events_stage_3.h
+++ b/kernel/trace/trace_events_stage_3.h
@@ -223,9 +223,9 @@ static void ftrace_raw_event_##call(proto) \
assign; \
\
if (call->preds && !filter_match_preds(call, entry)) \
- ring_buffer_event_discard(event); \
- \
- trace_nowake_buffer_unlock_commit(event, irq_flags, pc); \
+ trace_current_buffer_discard_commit(event); \
+ else \
+ trace_nowake_buffer_unlock_commit(event, irq_flags, pc); \
\
} \
\
--
1.6.2.1
--
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] [GIT PULL] for tip/tracing/filters
2009-04-02 5:27 [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
` (4 preceding siblings ...)
2009-04-02 5:27 ` [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events Steven Rostedt
@ 2009-04-02 5:37 ` Steven Rostedt
2009-04-03 12:14 ` Ingo Molnar
6 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2009-04-02 5:37 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker
On Thu, 2 Apr 2009, Steven Rostedt wrote:
>
> Ingo,
>
> Please pull the latest tip/tracing/filters tree, which can be found at:
Ingo,
Also note, I had to merge your kmemtrace tree into the filters branch
because there where some updates that Tom based his changes on.
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] ring-buffer: add ring_buffer_discard_commit
2009-04-02 5:27 ` [PATCH 3/4] ring-buffer: add ring_buffer_discard_commit Steven Rostedt
@ 2009-04-02 5:48 ` Andrew Morton
2009-04-02 13:08 ` Steven Rostedt
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2009-04-02 5:48 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker, Steven Rostedt
On Thu, 02 Apr 2009 01:27:24 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <srostedt@redhat.com>
>
> The ring_buffer_discard_commit is similar to ring_buffer_event_discard
> but it can only be done on an event that has yet to be commited.
> Unpredictable results can happen otherwise.
>
> The main difference between ring_buffer_discard_commit and
> ring_buffer_event_discard is that ring_buffer_discard_commit will try
> to free the data in the ring buffer if nothing has addded data
> after the reserved event. If something did, then it acts almost the
> same as ring_buffer_event_discard followed by a
> ring_buffer_unlock_commit.
>
> Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
> can be called on an event, not both.
>
> This commit also exports both discard functions to be usable by
> GPL modules.
>
> ...
>
> +/*
> + * ring_buffer_event_discard can discard any event in the ring buffer.
> + * it is up to the caller to protect against a reader from
> + * consuming it or a writer from wrapping and replacing it.
> + *
> + * No external protection is needed if this is called before
> + * the event is commited. But in that case it would be better to
> + * use ring_buffer_discard_commit.
> + *
> + * Note, if an event that has not been committed is discarded
> + * with ring_buffer_event_discard, it must still be committed.
> + */
> void ring_buffer_event_discard(struct ring_buffer_event *event);
>
> ...
>
> /**
> + * ring_buffer_event_discard - discard any event in the ring buffer
> + * @event: the event to discard
> + *
> + * Sometimes a event that is in the ring buffer needs to be ignored.
> + * This function lets the user discard an event in the ring buffer
> + * and then that event will not be read later.
> + *
> + * Note, it is up to the user to be careful with this, and protect
> + * against races. If the user discards an event that has been consumed
> + * it is possible that it could corrupt the ring buffer.
> + */
> +void ring_buffer_event_discard(struct ring_buffer_event *event)
> +{
> + event->type = RINGBUF_TYPE_PADDING;
> + /* time delta must be non zero */
> + if (!event->time_delta)
> + event->time_delta = 1;
> +}
> +EXPORT_SYMBOL_GPL(ring_buffer_event_discard);
Seems a bit cumbersome to document the interface partly in the .h file
and partly in the .c file?
Doing all of it in the .c file would be typical.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events
2009-04-02 5:27 ` [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events Steven Rostedt
@ 2009-04-02 9:06 ` Tom Zanussi
2009-04-02 15:01 ` Steven Rostedt
0 siblings, 1 reply; 22+ messages in thread
From: Tom Zanussi @ 2009-04-02 9:06 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Steven Rostedt
Hi Steve,
Great to see this! I updated my filter removal rcu patch, made some
changes to the filter_check_discard() to work with the new
ring_buffer_discard_commit() and made the necessary changes to the
ftrace tracers as well - see the patch below, which has only been
touch-tested at this point. It seemed to work at first, but then
produced an oops, which may well be a problem related to the changes I
made. I'll look into it more tomorrow night, and will also fix up this
patch and repost it if it basically looks ok. I'll post the oops and
the lines of code that it refers to just in case it it rings a bell...
Tom
[ 240.461982] ------------[ cut here ]------------
[ 240.461993] WARNING: at kernel/trace/ring_buffer.c:1610
ring_buffer_discard_commit+0xfa/0x100()
[ 240.462003] Hardware name: Inspiron 1501
[ 240.462009] Modules linked in: aes_x86_64 aes_generic rfkill_input
radeon drm binfmt_misc rfcomm bridge stp bnep sco l2cap bluetooth ppdev
cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand
freq_table cpufreq_conservative sbs sbshc pci_slot container af_packet
iptable_filter ip_tables x_tables parport_pc lp parport joydev
snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss
snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi
snd_seq_midi_event arc4 snd_seq ecb snd_timer dcdbas sdhci_pci psmouse
hid_logitech evdev b43 snd_seq_device sdhci mmc_core ricoh_mmc pcspkr
serio_raw k8temp snd rfkill i2c_piix4 mac80211 soundcore snd_page_alloc
cfg80211 led_class i2c_core input_polldev video output wmi battery ac
button shpchp pci_hotplug usbhid hid ext2 mbcache sr_mod cdrom
pata_atiixp pata_acpi sd_mod crc_t10dif sg b44 ehci_hcd mii ohci_hcd
ata_generic ahci ssb usbcore libata scsi_mod thermal processor fan fuse
[ 240.462259] Pid: 6143, comm: bash Not tainted 2.6.29-tip #35
[ 240.462267] Call Trace:
[ 240.462280] [<ffffffff8025c5c8>] warn_slowpath+0xd8/0x130
[ 240.462291] [<ffffffff802c2be5>] ? rb_reserve_next_event+0x45/0x360
[ 240.462304] [<ffffffff802c6fc1>] ? trace_buffer_lock_reserve
+0x51/0x70
[ 240.462316] [<ffffffff802c12ea>] ? ring_buffer_unlock_commit
+0x5a/0x60
[ 240.462329] [<ffffffff802c2be5>] ? rb_reserve_next_event+0x45/0x360
[ 240.462340] [<ffffffff802c30cb>] ? ring_buffer_lock_reserve
+0x9b/0xe0
[ 240.462354] [<ffffffff8035f42b>] ? dnotify_parent+0xb/0xc0
[ 240.462365] [<ffffffff8032af65>] ? vfs_write+0x155/0x1d0
[ 240.462375] [<ffffffff802c1e2a>] ring_buffer_discard_commit
+0xfa/0x100
[ 240.462386] [<ffffffff8035f42b>] ? dnotify_parent+0xb/0xc0
[ 240.462396] [<ffffffff802c843b>] trace_function+0xab/0xc0
[ 240.462406] [<ffffffff8032af65>] ? vfs_write+0x155/0x1d0
[ 240.462416] [<ffffffff802cc9c6>] function_trace_call+0x86/0xe0
[ 240.462429] [<ffffffff80213056>] ftrace_call+0x5/0x2b
[ 240.462439] [<ffffffff8035f430>] ? dnotify_parent+0x10/0xc0
[ 240.462449] [<ffffffff8032af65>] vfs_write+0x155/0x1d0
[ 240.462459] [<ffffffff8032b9e7>] sys_write+0x57/0xb0
[ 240.462469] [<ffffffff80213372>] system_call_fastpath+0x16/0x1b
[ 240.462479] ---[ end trace 94aeeb5f59624105 ]---
1592 void ring_buffer_discard_commit(struct ring_buffer *buffer,
1593 struct ring_buffer_event *event)
1594 {
1595 struct ring_buffer_per_cpu *cpu_buffer;
1596 unsigned long new_index, old_index;
1597 struct buffer_page *bpage;
1598 unsigned long index;
1599 unsigned long addr;
1600 int cpu;
1601
1602 /* The event is discarded regardless */
1603 ring_buffer_event_discard(event);
1604
1605 /*
1606 * This must only be called if the event has not been
1607 * committed yet. Thus we can assume that preemption
1608 * is still disabled.
1609 */
1610 RB_WARN_ON(buffer, !preempt_count());
---
kernel/trace/kmemtrace.c | 10 +++----
kernel/trace/trace.c | 44 ++++++++++++++++++----------------
kernel/trace/trace.h | 24 +++++++++++++++----
kernel/trace/trace_branch.c | 5 +--
kernel/trace/trace_events_filter.c | 15 +++++++----
kernel/trace/trace_events_stage_3.h | 5 +---
kernel/trace/trace_hw_branches.c | 4 +-
kernel/trace/trace_power.c | 8 +++---
8 files changed, 64 insertions(+), 51 deletions(-)
diff --git a/kernel/trace/kmemtrace.c b/kernel/trace/kmemtrace.c
index 4a5f053..2c4c8fd 100644
--- a/kernel/trace/kmemtrace.c
+++ b/kernel/trace/kmemtrace.c
@@ -290,9 +290,8 @@ void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
entry->gfp_flags = gfp_flags;
entry->node = node;
- filter_check_discard(call, entry, event);
-
- trace_buffer_unlock_commit(tr, event, 0, 0);
+ if (!filter_check_discard(tr, call, entry, event))
+ trace_buffer_unlock_commit(tr, event, 0, 0);
}
EXPORT_SYMBOL(kmemtrace_mark_alloc_node);
@@ -317,9 +316,8 @@ void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
entry->call_site = call_site;
entry->ptr = ptr;
- filter_check_discard(call, entry, event);
-
- trace_buffer_unlock_commit(tr, event, 0, 0);
+ if (!filter_check_discard(tr, call, entry, event))
+ trace_buffer_unlock_commit(tr, event, 0, 0);
}
EXPORT_SYMBOL(kmemtrace_mark_free);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d89489c..9347965 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -169,6 +169,11 @@ ns2usecs(cycle_t nsec)
*/
static struct trace_array global_trace;
+struct trace_array *get_global_trace(void)
+{
+ return &global_trace;
+}
+
static DEFINE_PER_CPU(struct trace_array_cpu, global_trace_cpu);
cycle_t ftrace_now(int cpu)
@@ -928,9 +933,8 @@ trace_function(struct trace_array *tr,
entry->ip = ip;
entry->parent_ip = parent_ip;
- filter_check_discard(call, entry, event);
-
- ring_buffer_unlock_commit(tr->buffer, event);
+ if (!filter_check_discard(tr, call, entry, event))
+ ring_buffer_unlock_commit(tr->buffer, event);
}
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -952,8 +956,8 @@ static int __trace_graph_entry(struct trace_array *tr,
return 0;
entry = ring_buffer_event_data(event);
entry->graph_ent = *trace;
- filter_check_discard(call, entry, event);
- ring_buffer_unlock_commit(global_trace.buffer, event);
+ if (!filter_check_discard(&global_trace, call, entry, event))
+ ring_buffer_unlock_commit(global_trace.buffer, event);
return 1;
}
@@ -976,8 +980,8 @@ static void __trace_graph_return(struct trace_array *tr,
return;
entry = ring_buffer_event_data(event);
entry->ret = *trace;
- filter_check_discard(call, entry, event);
- ring_buffer_unlock_commit(global_trace.buffer, event);
+ if (!filter_check_discard(&global_trace, call, entry, event))
+ ring_buffer_unlock_commit(global_trace.buffer, event);
}
#endif
@@ -1013,8 +1017,8 @@ static void __ftrace_trace_stack(struct trace_array *tr,
trace.entries = entry->caller;
save_stack_trace(&trace);
- filter_check_discard(call, entry, event);
- ring_buffer_unlock_commit(tr->buffer, event);
+ if (!filter_check_discard(tr, call, entry, event))
+ ring_buffer_unlock_commit(tr->buffer, event);
#endif
}
@@ -1061,8 +1065,8 @@ static void ftrace_trace_userstack(struct trace_array *tr,
trace.entries = entry->caller;
save_stack_trace_user(&trace);
- filter_check_discard(call, entry, event);
- ring_buffer_unlock_commit(tr->buffer, event);
+ if (!filter_check_discard(tr, call, entry, event))
+ ring_buffer_unlock_commit(tr->buffer, event);
#endif
}
@@ -1123,9 +1127,8 @@ tracing_sched_switch_trace(struct trace_array *tr,
entry->next_state = next->state;
entry->next_cpu = task_cpu(next);
- filter_check_discard(call, entry, event);
-
- trace_buffer_unlock_commit(tr, event, flags, pc);
+ if (!filter_check_discard(tr, call, entry, event))
+ trace_buffer_unlock_commit(tr, event, flags, pc);
}
void
@@ -1151,9 +1154,8 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
entry->next_state = wakee->state;
entry->next_cpu = task_cpu(wakee);
- filter_check_discard(call, entry, event);
-
- ring_buffer_unlock_commit(tr->buffer, event);
+ if (!filter_check_discard(tr, call, entry, event))
+ ring_buffer_unlock_commit(tr->buffer, event);
ftrace_trace_stack(tr, flags, 6, pc);
ftrace_trace_userstack(tr, flags, pc);
}
@@ -1294,8 +1296,8 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
entry->fmt = fmt;
memcpy(entry->buf, trace_buf, sizeof(u32) * len);
- filter_check_discard(call, entry, event);
- ring_buffer_unlock_commit(tr->buffer, event);
+ if (!filter_check_discard(tr, call, entry, event))
+ ring_buffer_unlock_commit(tr->buffer, event);
out_unlock:
__raw_spin_unlock(&trace_buf_lock);
@@ -1350,8 +1352,8 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
memcpy(&entry->buf, trace_buf, len);
entry->buf[len] = 0;
- filter_check_discard(call, entry, event);
- ring_buffer_unlock_commit(tr->buffer, event);
+ if (!filter_check_discard(tr, call, entry, event))
+ ring_buffer_unlock_commit(tr->buffer, event);
out_unlock:
__raw_spin_unlock(&trace_buf_lock);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2f16866..0c79a49 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -848,17 +848,29 @@ extern int filter_parse(char **pbuf, struct filter_pred *pred);
extern int filter_add_pred(struct ftrace_event_call *call,
struct filter_pred *pred);
extern void filter_free_preds(struct ftrace_event_call *call);
-extern int filter_match_preds(struct ftrace_event_call *call, void *rec);
+extern int filter_match_preds(struct filter_pred **preds, void *rec);
extern void filter_free_subsystem_preds(struct event_subsystem *system);
extern int filter_add_subsystem_pred(struct event_subsystem *system,
struct filter_pred *pred);
-static inline void
-filter_check_discard(struct ftrace_event_call *call, void *rec,
+static inline int
+filter_check_discard(struct trace_array *trace,
+ struct ftrace_event_call *call,
+ void *rec,
struct ring_buffer_event *event)
{
- if (unlikely(call->preds) && !filter_match_preds(call, rec))
- ring_buffer_event_discard(event);
+ struct filter_pred **preds;
+ int discarded = 0;
+
+ rcu_read_lock();
+ preds = rcu_dereference(call->preds);
+ if (unlikely(preds) && !filter_match_preds(preds, rec)) {
+ ring_buffer_discard_commit(trace->buffer, event);
+ discarded = 1;
+ }
+ rcu_read_unlock();
+
+ return discarded;
}
#define __common_field(type, item) \
@@ -899,6 +911,8 @@ do { \
__trace_printk(ip, fmt, ##args); \
} while (0)
+extern struct trace_array *get_global_trace(void);
+
#undef TRACE_EVENT_FORMAT
#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
extern struct ftrace_event_call event_##call;
diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
index 25e618a..a36e4d5 100644
--- a/kernel/trace/trace_branch.c
+++ b/kernel/trace/trace_branch.c
@@ -74,9 +74,8 @@ probe_likely_condition(struct ftrace_branch_data *f, int val, int expect)
entry->line = f->line;
entry->correct = val == expect;
- filter_check_discard(call, entry, event);
-
- ring_buffer_unlock_commit(tr->buffer, event);
+ if (!filter_check_discard(tr, call, entry, event))
+ ring_buffer_unlock_commit(tr->buffer, event);
out:
atomic_dec(&tr->data[cpu]->disabled);
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 470ad94..f7f8469 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -83,14 +83,14 @@ static int filter_pred_string(struct filter_pred *pred, void *event)
}
/* return 1 if event matches, 0 otherwise (discard) */
-int filter_match_preds(struct ftrace_event_call *call, void *rec)
+int filter_match_preds(struct filter_pred **preds, void *rec)
{
int i, matched, and_failed = 0;
struct filter_pred *pred;
for (i = 0; i < MAX_FILTER_PRED; i++) {
- if (call->preds[i]) {
- pred = call->preds[i];
+ if (preds[i]) {
+ pred = preds[i];
if (and_failed && !pred->or)
continue;
matched = pred->fn(pred, rec);
@@ -162,13 +162,16 @@ void filter_free_pred(struct filter_pred *pred)
void filter_free_preds(struct ftrace_event_call *call)
{
+ struct filter_pred **preds;
int i;
if (call->preds) {
+ preds = call->preds;
+ rcu_assign_pointer(call->preds, NULL);
+ synchronize_rcu();
for (i = 0; i < MAX_FILTER_PRED; i++)
- filter_free_pred(call->preds[i]);
- kfree(call->preds);
- call->preds = NULL;
+ filter_free_pred(preds[i]);
+ kfree(preds);
}
}
diff --git a/kernel/trace/trace_events_stage_3.h b/kernel/trace/trace_events_stage_3.h
index d2f34bf..97a9e1e 100644
--- a/kernel/trace/trace_events_stage_3.h
+++ b/kernel/trace/trace_events_stage_3.h
@@ -222,11 +222,8 @@ static void ftrace_raw_event_##call(proto) \
\
assign; \
\
- if (call->preds && !filter_match_preds(call, entry)) \
- trace_current_buffer_discard_commit(event); \
- else \
+ if (!filter_check_discard(get_global_trace(), call, entry, event)) \
trace_nowake_buffer_unlock_commit(event, irq_flags, pc); \
- \
} \
\
static int ftrace_raw_reg_event_##call(void) \
diff --git a/kernel/trace/trace_hw_branches.c b/kernel/trace/trace_hw_branches.c
index e6b275b..7ade2f8 100644
--- a/kernel/trace/trace_hw_branches.c
+++ b/kernel/trace/trace_hw_branches.c
@@ -195,8 +195,8 @@ void trace_hw_branch(u64 from, u64 to)
entry->ent.type = TRACE_HW_BRANCHES;
entry->from = from;
entry->to = to;
- filter_check_discard(call, entry, event);
- trace_buffer_unlock_commit(tr, event, 0, 0);
+ if (!filter_check_discard(tr, call, entry, event))
+ trace_buffer_unlock_commit(tr, event, 0, 0);
out:
atomic_dec(&tr->data[cpu]->disabled);
diff --git a/kernel/trace/trace_power.c b/kernel/trace/trace_power.c
index 8ce7d7d..ed506a1 100644
--- a/kernel/trace/trace_power.c
+++ b/kernel/trace/trace_power.c
@@ -55,8 +55,8 @@ static void probe_power_end(struct power_trace *it)
goto out;
entry = ring_buffer_event_data(event);
entry->state_data = *it;
- filter_check_discard(call, entry, event);
- trace_buffer_unlock_commit(tr, event, 0, 0);
+ if (!filter_check_discard(tr, call, entry, event))
+ trace_buffer_unlock_commit(tr, event, 0, 0);
out:
preempt_enable();
}
@@ -87,8 +87,8 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
goto out;
entry = ring_buffer_event_data(event);
entry->state_data = *it;
- filter_check_discard(call, entry, event);
- trace_buffer_unlock_commit(tr, event, 0, 0);
+ if (!filter_check_discard(tr, call, entry, event))
+ trace_buffer_unlock_commit(tr, event, 0, 0);
out:
preempt_enable();
}
--
1.5.6.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events
2009-04-02 5:27 ` [PATCH 1/4] tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events Steven Rostedt
@ 2009-04-02 12:17 ` Frederic Weisbecker
0 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2009-04-02 12:17 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Tom Zanussi, Steven Rostedt
On Thu, Apr 02, 2009 at 01:27:22AM -0400, Steven Rostedt wrote:
> From: Tom Zanussi <tzanussi@gmail.com>
>
> This patch adds run-time field descriptions to all the event formats
> exported using TRACE_EVENT_FORMAT. It also hooks up all the tracers
> that use them (i.e. the tracers in the 'ftrace subsystem') so they can
> also have their output filtered by the event-filtering mechanism.
>
> When I was testing this, there were a couple of things that fooled me
> into thinking the filters weren't working, when actually they were -
> I'll mention them here so others don't make the same mistakes (and file
> bug reports. ;-)
>
> One is that some of the tracers trace multiple events e.g. the
> sched_switch tracer uses the context_switch and wakeup events, and if
> you don't set filters on all of the traced events, the unfiltered output
> from the events without filters on them can make it look like the
> filtering as a whole isn't working properly, when actually it is doing
> what it was asked to do - it just wasn't asked to do the right thing.
>
> The other is that for the really high-volume tracers e.g. the function
> tracer, the volume of filtered events can be so high that it pushes the
> unfiltered events out of the ring buffer before they can be read so e.g.
> cat'ing the trace file repeatedly shows either no output, or once in
> awhile some output but that isn't there the next time you read the
> trace, which isn't what you normally expect when reading the trace file.
> If you read from the trace_pipe file though, you can catch them before
> they disappear.
>
> Changes from v1:
>
> As suggested by Frederic Weisbecker:
>
> - get rid of externs in functions
> - added unlikely() to filter_check_discard()
Thanks Tom!
> Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
> kernel/trace/kmemtrace.c | 6 ++++
> kernel/trace/trace.c | 25 +++++++++++++++
> kernel/trace/trace.h | 20 ++++++++++++
> kernel/trace/trace_branch.c | 3 ++
> kernel/trace/trace_event_types.h | 6 ++-
> kernel/trace/trace_events.c | 7 ++++
> kernel/trace/trace_events_filter.c | 4 +-
> kernel/trace/trace_events_stage_2.h | 7 ----
> kernel/trace/trace_export.c | 57 +++++++++++++++++++++++++++++++++--
> kernel/trace/trace_hw_branches.c | 2 +
> kernel/trace/trace_power.c | 4 ++
> 11 files changed, 127 insertions(+), 14 deletions(-)
>
> diff --git a/kernel/trace/kmemtrace.c b/kernel/trace/kmemtrace.c
> index 5011f4d..65aba48 100644
> --- a/kernel/trace/kmemtrace.c
> +++ b/kernel/trace/kmemtrace.c
> @@ -42,6 +42,7 @@ static inline void kmemtrace_alloc(enum kmemtrace_type_id type_id,
> gfp_t gfp_flags,
> int node)
> {
> + struct ftrace_event_call *call = &event_kmem_alloc;
> struct trace_array *tr = kmemtrace_array;
> struct kmemtrace_alloc_entry *entry;
> struct ring_buffer_event *event;
> @@ -62,6 +63,8 @@ static inline void kmemtrace_alloc(enum kmemtrace_type_id type_id,
> entry->gfp_flags = gfp_flags;
> entry->node = node;
>
> + filter_check_discard(call, entry, event);
> +
> ring_buffer_unlock_commit(tr->buffer, event);
>
> trace_wake_up();
> @@ -71,6 +74,7 @@ static inline void kmemtrace_free(enum kmemtrace_type_id type_id,
> unsigned long call_site,
> const void *ptr)
> {
> + struct ftrace_event_call *call = &event_kmem_free;
> struct trace_array *tr = kmemtrace_array;
> struct kmemtrace_free_entry *entry;
> struct ring_buffer_event *event;
> @@ -86,6 +90,8 @@ static inline void kmemtrace_free(enum kmemtrace_type_id type_id,
> entry->call_site = call_site;
> entry->ptr = ptr;
>
> + filter_check_discard(call, entry, event);
> +
> ring_buffer_unlock_commit(tr->buffer, event);
>
> trace_wake_up();
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index a0174a4..9a10af0 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -907,6 +907,7 @@ trace_function(struct trace_array *tr,
> unsigned long ip, unsigned long parent_ip, unsigned long flags,
> int pc)
> {
> + struct ftrace_event_call *call = &event_function;
> struct ring_buffer_event *event;
> struct ftrace_entry *entry;
>
> @@ -921,6 +922,9 @@ trace_function(struct trace_array *tr,
> entry = ring_buffer_event_data(event);
> entry->ip = ip;
> entry->parent_ip = parent_ip;
> +
> + filter_check_discard(call, entry, event);
> +
> ring_buffer_unlock_commit(tr->buffer, event);
> }
>
> @@ -930,6 +934,7 @@ static int __trace_graph_entry(struct trace_array *tr,
> unsigned long flags,
> int pc)
> {
> + struct ftrace_event_call *call = &event_funcgraph_entry;
> struct ring_buffer_event *event;
> struct ftrace_graph_ent_entry *entry;
>
> @@ -942,6 +947,7 @@ static int __trace_graph_entry(struct trace_array *tr,
> return 0;
> entry = ring_buffer_event_data(event);
> entry->graph_ent = *trace;
> + filter_check_discard(call, entry, event);
I still wonder a bit about this one and __trace_graph_return.
We had some hangs with the function graph tracer by the past because
it's a slow machine and once we add something that slows it more, it can
make a box spending all its time to service the timer interrupt :-)
I think it will not be a problem for one filter, but may be it could
when we have more. Anyway, this is just a matter of testing and if it hangs
we will just have to limit the number of possible filters.
Thanks,
Frederic.
> ring_buffer_unlock_commit(global_trace.buffer, event);
>
> return 1;
> @@ -952,6 +958,7 @@ static void __trace_graph_return(struct trace_array *tr,
> unsigned long flags,
> int pc)
> {
> + struct ftrace_event_call *call = &event_funcgraph_exit;
> struct ring_buffer_event *event;
> struct ftrace_graph_ret_entry *entry;
>
> @@ -964,6 +971,7 @@ static void __trace_graph_return(struct trace_array *tr,
> return;
> entry = ring_buffer_event_data(event);
> entry->ret = *trace;
> + filter_check_discard(call, entry, event);
> ring_buffer_unlock_commit(global_trace.buffer, event);
> }
> #endif
> @@ -982,6 +990,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
> int skip, int pc)
> {
> #ifdef CONFIG_STACKTRACE
> + struct ftrace_event_call *call = &event_kernel_stack;
> struct ring_buffer_event *event;
> struct stack_entry *entry;
> struct stack_trace trace;
> @@ -999,6 +1008,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
> trace.entries = entry->caller;
>
> save_stack_trace(&trace);
> + filter_check_discard(call, entry, event);
> ring_buffer_unlock_commit(tr->buffer, event);
> #endif
> }
> @@ -1024,6 +1034,7 @@ static void ftrace_trace_userstack(struct trace_array *tr,
> unsigned long flags, int pc)
> {
> #ifdef CONFIG_STACKTRACE
> + struct ftrace_event_call *call = &event_user_stack;
> struct ring_buffer_event *event;
> struct userstack_entry *entry;
> struct stack_trace trace;
> @@ -1045,6 +1056,7 @@ static void ftrace_trace_userstack(struct trace_array *tr,
> trace.entries = entry->caller;
>
> save_stack_trace_user(&trace);
> + filter_check_discard(call, entry, event);
> ring_buffer_unlock_commit(tr->buffer, event);
> #endif
> }
> @@ -1061,6 +1073,7 @@ ftrace_trace_special(void *__tr,
> unsigned long arg1, unsigned long arg2, unsigned long arg3,
> int pc)
> {
> + struct ftrace_event_call *call = &event_special;
> struct ring_buffer_event *event;
> struct trace_array *tr = __tr;
> struct special_entry *entry;
> @@ -1073,6 +1086,7 @@ ftrace_trace_special(void *__tr,
> entry->arg1 = arg1;
> entry->arg2 = arg2;
> entry->arg3 = arg3;
> + filter_check_discard(call, entry, event);
> trace_buffer_unlock_commit(tr, event, 0, pc);
> }
>
> @@ -1089,6 +1103,7 @@ tracing_sched_switch_trace(struct trace_array *tr,
> struct task_struct *next,
> unsigned long flags, int pc)
> {
> + struct ftrace_event_call *call = &event_context_switch;
> struct ring_buffer_event *event;
> struct ctx_switch_entry *entry;
>
> @@ -1104,6 +1119,9 @@ tracing_sched_switch_trace(struct trace_array *tr,
> entry->next_prio = next->prio;
> entry->next_state = next->state;
> entry->next_cpu = task_cpu(next);
> +
> + filter_check_discard(call, entry, event);
> +
> trace_buffer_unlock_commit(tr, event, flags, pc);
> }
>
> @@ -1113,6 +1131,7 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
> struct task_struct *curr,
> unsigned long flags, int pc)
> {
> + struct ftrace_event_call *call = &event_wakeup;
> struct ring_buffer_event *event;
> struct ctx_switch_entry *entry;
>
> @@ -1129,6 +1148,8 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
> entry->next_state = wakee->state;
> entry->next_cpu = task_cpu(wakee);
>
> + filter_check_discard(call, entry, event);
> +
> ring_buffer_unlock_commit(tr->buffer, event);
> ftrace_trace_stack(tr, flags, 6, pc);
> ftrace_trace_userstack(tr, flags, pc);
> @@ -1230,6 +1251,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
> (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
> static u32 trace_buf[TRACE_BUF_SIZE];
>
> + struct ftrace_event_call *call = &event_bprint;
> struct ring_buffer_event *event;
> struct trace_array *tr = &global_trace;
> struct trace_array_cpu *data;
> @@ -1269,6 +1291,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
> entry->fmt = fmt;
>
> memcpy(entry->buf, trace_buf, sizeof(u32) * len);
> + filter_check_discard(call, entry, event);
> ring_buffer_unlock_commit(tr->buffer, event);
>
> out_unlock:
> @@ -1288,6 +1311,7 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
> static raw_spinlock_t trace_buf_lock = __RAW_SPIN_LOCK_UNLOCKED;
> static char trace_buf[TRACE_BUF_SIZE];
>
> + struct ftrace_event_call *call = &event_print;
> struct ring_buffer_event *event;
> struct trace_array *tr = &global_trace;
> struct trace_array_cpu *data;
> @@ -1323,6 +1347,7 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
>
> memcpy(&entry->buf, trace_buf, len);
> entry->buf[len] = 0;
> + filter_check_discard(call, entry, event);
> ring_buffer_unlock_commit(tr->buffer, event);
>
> out_unlock:
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index cbc168f..e1615d9 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -858,6 +858,21 @@ extern void filter_free_subsystem_preds(struct event_subsystem *system);
> extern int filter_add_subsystem_pred(struct event_subsystem *system,
> struct filter_pred *pred);
>
> +static inline void
> +filter_check_discard(struct ftrace_event_call *call, void *rec,
> + struct ring_buffer_event *event)
> +{
> + if (unlikely(call->preds) && !filter_match_preds(call, rec))
> + ring_buffer_event_discard(event);
> +}
> +
> +#define __common_field(type, item) \
> + ret = trace_define_field(event_call, #type, "common_" #item, \
> + offsetof(typeof(field.ent), item), \
> + sizeof(field.ent.item)); \
> + if (ret) \
> + return ret;
> +
> void event_trace_printk(unsigned long ip, const char *fmt, ...);
> extern struct ftrace_event_call __start_ftrace_events[];
> extern struct ftrace_event_call __stop_ftrace_events[];
> @@ -889,4 +904,9 @@ do { \
> __trace_printk(ip, fmt, ##args); \
> } while (0)
>
> +#undef TRACE_EVENT_FORMAT
> +#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
> + extern struct ftrace_event_call event_##call;
> +#include "trace_event_types.h"
> +
> #endif /* _LINUX_KERNEL_TRACE_H */
> diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
> index ad8c22e..25e618a 100644
> --- a/kernel/trace/trace_branch.c
> +++ b/kernel/trace/trace_branch.c
> @@ -30,6 +30,7 @@ static struct trace_array *branch_tracer;
> static void
> probe_likely_condition(struct ftrace_branch_data *f, int val, int expect)
> {
> + struct ftrace_event_call *call = &event_branch;
> struct trace_array *tr = branch_tracer;
> struct ring_buffer_event *event;
> struct trace_branch *entry;
> @@ -73,6 +74,8 @@ probe_likely_condition(struct ftrace_branch_data *f, int val, int expect)
> entry->line = f->line;
> entry->correct = val == expect;
>
> + filter_check_discard(call, entry, event);
> +
> ring_buffer_unlock_commit(tr->buffer, event);
>
> out:
> diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
> index fd78bee..95b147a 100644
> --- a/kernel/trace/trace_event_types.h
> +++ b/kernel/trace/trace_event_types.h
> @@ -122,8 +122,10 @@ TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore,
> TRACE_EVENT_FORMAT(branch, TRACE_BRANCH, trace_branch, ignore,
> TRACE_STRUCT(
> TRACE_FIELD(unsigned int, line, line)
> - TRACE_FIELD_SPECIAL(char func[TRACE_FUNC_SIZE+1], func, func)
> - TRACE_FIELD_SPECIAL(char file[TRACE_FUNC_SIZE+1], file, file)
> + TRACE_FIELD_SPECIAL(char func[TRACE_FUNC_SIZE+1], func,
> + TRACE_FUNC_SIZE+1, func)
> + TRACE_FIELD_SPECIAL(char file[TRACE_FUNC_SIZE+1], file,
> + TRACE_FUNC_SIZE+1, file)
> TRACE_FIELD(char, correct, correct)
> ),
> TP_RAW_FMT("%u:%s:%s (%u)")
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 64ec4d2..be9299a 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -680,6 +680,7 @@ static struct dentry *
> event_subsystem_dir(const char *name, struct dentry *d_events)
> {
> struct event_subsystem *system;
> + struct dentry *entry;
>
> /* First see if we did not already create this dir */
> list_for_each_entry(system, &event_subsystems, list) {
> @@ -708,6 +709,12 @@ event_subsystem_dir(const char *name, struct dentry *d_events)
>
> system->preds = NULL;
>
> + entry = debugfs_create_file("filter", 0644, system->entry, system,
> + &ftrace_subsystem_filter_fops);
> + if (!entry)
> + pr_warning("Could not create debugfs "
> + "'%s/filter' entry\n", name);
> +
> return system->entry;
> }
>
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index 026be41..470ad94 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -185,7 +185,7 @@ void filter_free_subsystem_preds(struct event_subsystem *system)
> }
>
> events_for_each(call) {
> - if (!call->name || !call->regfunc)
> + if (!call->define_fields)
> continue;
>
> if (!strcmp(call->system, system->name))
> @@ -324,7 +324,7 @@ int filter_add_subsystem_pred(struct event_subsystem *system,
> events_for_each(call) {
> int err;
>
> - if (!call->name || !call->regfunc)
> + if (!call->define_fields)
> continue;
>
> if (strcmp(call->system, system->name))
> diff --git a/kernel/trace/trace_events_stage_2.h b/kernel/trace/trace_events_stage_2.h
> index 30743f7..1c94b87 100644
> --- a/kernel/trace/trace_events_stage_2.h
> +++ b/kernel/trace/trace_events_stage_2.h
> @@ -146,13 +146,6 @@ ftrace_format_##call(struct trace_seq *s) \
> if (ret) \
> return ret;
>
> -#define __common_field(type, item) \
> - ret = trace_define_field(event_call, #type, "common_" #item, \
> - offsetof(typeof(field.ent), item), \
> - sizeof(field.ent.item)); \
> - if (ret) \
> - return ret;
> -
> #undef TRACE_EVENT
> #define TRACE_EVENT(call, proto, args, tstruct, func, print) \
> int \
> diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> index 4d9952d..47989be 100644
> --- a/kernel/trace/trace_export.c
> +++ b/kernel/trace/trace_export.c
> @@ -30,7 +30,7 @@
>
>
> #undef TRACE_FIELD_SPECIAL
> -#define TRACE_FIELD_SPECIAL(type_item, item, cmd) \
> +#define TRACE_FIELD_SPECIAL(type_item, item, len, cmd) \
> ret = trace_seq_printf(s, "\tfield special:" #type_item ";\t" \
> "offset:%u;\tsize:%u;\n", \
> (unsigned int)offsetof(typeof(field), item), \
> @@ -85,18 +85,69 @@ ftrace_format_##call(struct trace_seq *s) \
> #define TRACE_ENTRY entry
>
> #undef TRACE_FIELD_SPECIAL
> -#define TRACE_FIELD_SPECIAL(type_item, item, cmd) \
> +#define TRACE_FIELD_SPECIAL(type_item, item, len, cmd) \
> cmd;
>
> #undef TRACE_EVENT_FORMAT
> #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
> +int ftrace_define_fields_##call(void); \
> +static int ftrace_raw_init_event_##call(void); \
> \
> -static struct ftrace_event_call __used \
> +struct ftrace_event_call __used \
> __attribute__((__aligned__(4))) \
> __attribute__((section("_ftrace_events"))) event_##call = { \
> .name = #call, \
> .id = proto, \
> .system = __stringify(TRACE_SYSTEM), \
> + .raw_init = ftrace_raw_init_event_##call, \
> .show_format = ftrace_format_##call, \
> + .define_fields = ftrace_define_fields_##call, \
> +}; \
> +static int ftrace_raw_init_event_##call(void) \
> +{ \
> + INIT_LIST_HEAD(&event_##call.fields); \
> + return 0; \
> +} \
> +
> +#include "trace_event_types.h"
> +
> +#undef TRACE_FIELD
> +#define TRACE_FIELD(type, item, assign) \
> + ret = trace_define_field(event_call, #type, #item, \
> + offsetof(typeof(field), item), \
> + sizeof(field.item)); \
> + if (ret) \
> + return ret;
> +
> +#undef TRACE_FIELD_SPECIAL
> +#define TRACE_FIELD_SPECIAL(type, item, len, cmd) \
> + ret = trace_define_field(event_call, #type "[" #len "]", #item, \
> + offsetof(typeof(field), item), \
> + sizeof(field.item)); \
> + if (ret) \
> + return ret;
> +
> +#undef TRACE_FIELD_ZERO_CHAR
> +#define TRACE_FIELD_ZERO_CHAR(item)
> +
> +#undef TRACE_EVENT_FORMAT
> +#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
> +int \
> +ftrace_define_fields_##call(void) \
> +{ \
> + struct ftrace_event_call *event_call = &event_##call; \
> + struct args field; \
> + int ret; \
> + \
> + __common_field(unsigned char, type); \
> + __common_field(unsigned char, flags); \
> + __common_field(unsigned char, preempt_count); \
> + __common_field(int, pid); \
> + __common_field(int, tgid); \
> + \
> + tstruct; \
> + \
> + return ret; \
> }
> +
> #include "trace_event_types.h"
> diff --git a/kernel/trace/trace_hw_branches.c b/kernel/trace/trace_hw_branches.c
> index 7bfdf4c..e6b275b 100644
> --- a/kernel/trace/trace_hw_branches.c
> +++ b/kernel/trace/trace_hw_branches.c
> @@ -168,6 +168,7 @@ static enum print_line_t bts_trace_print_line(struct trace_iterator *iter)
>
> void trace_hw_branch(u64 from, u64 to)
> {
> + struct ftrace_event_call *call = &event_hw_branch;
> struct trace_array *tr = hw_branch_trace;
> struct ring_buffer_event *event;
> struct hw_branch_entry *entry;
> @@ -194,6 +195,7 @@ void trace_hw_branch(u64 from, u64 to)
> entry->ent.type = TRACE_HW_BRANCHES;
> entry->from = from;
> entry->to = to;
> + filter_check_discard(call, entry, event);
> trace_buffer_unlock_commit(tr, event, 0, 0);
>
> out:
> diff --git a/kernel/trace/trace_power.c b/kernel/trace/trace_power.c
> index bae791e..8ce7d7d 100644
> --- a/kernel/trace/trace_power.c
> +++ b/kernel/trace/trace_power.c
> @@ -36,6 +36,7 @@ static void probe_power_start(struct power_trace *it, unsigned int type,
>
> static void probe_power_end(struct power_trace *it)
> {
> + struct ftrace_event_call *call = &event_power;
> struct ring_buffer_event *event;
> struct trace_power *entry;
> struct trace_array_cpu *data;
> @@ -54,6 +55,7 @@ static void probe_power_end(struct power_trace *it)
> goto out;
> entry = ring_buffer_event_data(event);
> entry->state_data = *it;
> + filter_check_discard(call, entry, event);
> trace_buffer_unlock_commit(tr, event, 0, 0);
> out:
> preempt_enable();
> @@ -62,6 +64,7 @@ static void probe_power_end(struct power_trace *it)
> static void probe_power_mark(struct power_trace *it, unsigned int type,
> unsigned int level)
> {
> + struct ftrace_event_call *call = &event_power;
> struct ring_buffer_event *event;
> struct trace_power *entry;
> struct trace_array_cpu *data;
> @@ -84,6 +87,7 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
> goto out;
> entry = ring_buffer_event_data(event);
> entry->state_data = *it;
> + filter_check_discard(call, entry, event);
> trace_buffer_unlock_commit(tr, event, 0, 0);
> out:
> preempt_enable();
> --
> 1.6.2.1
>
> --
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] ring-buffer: add ring_buffer_discard_commit
2009-04-02 5:48 ` Andrew Morton
@ 2009-04-02 13:08 ` Steven Rostedt
0 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2009-04-02 13:08 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker, Steven Rostedt
On Wed, 1 Apr 2009, Andrew Morton wrote:
> On Thu, 02 Apr 2009 01:27:24 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > From: Steven Rostedt <srostedt@redhat.com>
> >
> > The ring_buffer_discard_commit is similar to ring_buffer_event_discard
> > but it can only be done on an event that has yet to be commited.
> > Unpredictable results can happen otherwise.
> >
> > The main difference between ring_buffer_discard_commit and
> > ring_buffer_event_discard is that ring_buffer_discard_commit will try
> > to free the data in the ring buffer if nothing has addded data
> > after the reserved event. If something did, then it acts almost the
> > same as ring_buffer_event_discard followed by a
> > ring_buffer_unlock_commit.
> >
> > Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
> > can be called on an event, not both.
> >
> > This commit also exports both discard functions to be usable by
> > GPL modules.
> >
> > ...
> >
> > +/*
> > + * ring_buffer_event_discard can discard any event in the ring buffer.
> > + * it is up to the caller to protect against a reader from
> > + * consuming it or a writer from wrapping and replacing it.
> > + *
> > + * No external protection is needed if this is called before
> > + * the event is commited. But in that case it would be better to
> > + * use ring_buffer_discard_commit.
> > + *
> > + * Note, if an event that has not been committed is discarded
> > + * with ring_buffer_event_discard, it must still be committed.
> > + */
> > void ring_buffer_event_discard(struct ring_buffer_event *event);
> >
> > ...
> >
> > /**
> > + * ring_buffer_event_discard - discard any event in the ring buffer
> > + * @event: the event to discard
> > + *
> > + * Sometimes a event that is in the ring buffer needs to be ignored.
> > + * This function lets the user discard an event in the ring buffer
> > + * and then that event will not be read later.
> > + *
> > + * Note, it is up to the user to be careful with this, and protect
> > + * against races. If the user discards an event that has been consumed
> > + * it is possible that it could corrupt the ring buffer.
> > + */
> > +void ring_buffer_event_discard(struct ring_buffer_event *event)
> > +{
> > + event->type = RINGBUF_TYPE_PADDING;
> > + /* time delta must be non zero */
> > + if (!event->time_delta)
> > + event->time_delta = 1;
> > +}
> > +EXPORT_SYMBOL_GPL(ring_buffer_event_discard);
>
> Seems a bit cumbersome to document the interface partly in the .h file
> and partly in the .c file?
>
> Doing all of it in the .c file would be typical.
I thought the full documentation was in the .c file. I actually added the
.h documentation because I was afraid that developers would just look at
the header file and pick the wrong option. The kernel-doc documentation is
for the C file. The .h file was a quick reference for those that wanted to
know which function they needed. Since the two are similar, I wanted to
make it clear what the differences were both in the .c and .h files.
It would even remind me on which to use ;-)
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events
2009-04-02 9:06 ` Tom Zanussi
@ 2009-04-02 15:01 ` Steven Rostedt
2009-04-03 11:51 ` Ingo Molnar
2009-04-07 5:46 ` Tom Zanussi
0 siblings, 2 replies; 22+ messages in thread
From: Steven Rostedt @ 2009-04-02 15:01 UTC (permalink / raw)
To: Tom Zanussi
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Steven Rostedt
On Thu, 2 Apr 2009, Tom Zanussi wrote:
> Hi Steve,
>
> Great to see this! I updated my filter removal rcu patch, made some
> changes to the filter_check_discard() to work with the new
> ring_buffer_discard_commit() and made the necessary changes to the
> ftrace tracers as well - see the patch below, which has only been
> touch-tested at this point. It seemed to work at first, but then
> produced an oops, which may well be a problem related to the changes I
> made. I'll look into it more tomorrow night, and will also fix up this
> patch and repost it if it basically looks ok. I'll post the oops and
> the lines of code that it refers to just in case it it rings a bell...
Yeah, it looks like it is related to changes that you made ;-)
>
> [ 240.461982] ------------[ cut here ]------------
> [ 240.461993] WARNING: at kernel/trace/ring_buffer.c:1610
> ring_buffer_discard_commit+0xfa/0x100()
> [ 240.462259] Pid: 6143, comm: bash Not tainted 2.6.29-tip #35
> [ 240.462267] Call Trace:
> [ 240.462280] [<ffffffff8025c5c8>] warn_slowpath+0xd8/0x130
> [ 240.462291] [<ffffffff802c2be5>] ? rb_reserve_next_event+0x45/0x360
> [ 240.462304] [<ffffffff802c6fc1>] ? trace_buffer_lock_reserve
> +0x51/0x70
> [ 240.462316] [<ffffffff802c12ea>] ? ring_buffer_unlock_commit
> +0x5a/0x60
> [ 240.462329] [<ffffffff802c2be5>] ? rb_reserve_next_event+0x45/0x360
> [ 240.462340] [<ffffffff802c30cb>] ? ring_buffer_lock_reserve
> +0x9b/0xe0
> [ 240.462354] [<ffffffff8035f42b>] ? dnotify_parent+0xb/0xc0
> [ 240.462365] [<ffffffff8032af65>] ? vfs_write+0x155/0x1d0
> [ 240.462375] [<ffffffff802c1e2a>] ring_buffer_discard_commit
> +0xfa/0x100
> [ 240.462386] [<ffffffff8035f42b>] ? dnotify_parent+0xb/0xc0
> [ 240.462396] [<ffffffff802c843b>] trace_function+0xab/0xc0
Oo, you have trace filters in trace_function. I'm not sure we want that.
If we do, please make it a separate function register. That is, the
function tracer is such a extreme hot path, that I would like to avoid as
many branch conditionals as possible. With the dynamic function tracing,
we can pick functions that do conditions when we want them, and we can
pick those that do not. If you look at some of the
register_ftrace_function callers, you will see where I've picked different
functions to be called by the tracer depending on what the options are.
> [ 240.462406] [<ffffffff8032af65>] ? vfs_write+0x155/0x1d0
> [ 240.462416] [<ffffffff802cc9c6>] function_trace_call+0x86/0xe0
> [ 240.462429] [<ffffffff80213056>] ftrace_call+0x5/0x2b
> [ 240.462439] [<ffffffff8035f430>] ? dnotify_parent+0x10/0xc0
> [ 240.462449] [<ffffffff8032af65>] vfs_write+0x155/0x1d0
> [ 240.462459] [<ffffffff8032b9e7>] sys_write+0x57/0xb0
> [ 240.462469] [<ffffffff80213372>] system_call_fastpath+0x16/0x1b
> [ 240.462479] ---[ end trace 94aeeb5f59624105 ]---
>
> 1592 void ring_buffer_discard_commit(struct ring_buffer *buffer,
> 1593 struct ring_buffer_event *event)
> 1594 {
> 1595 struct ring_buffer_per_cpu *cpu_buffer;
> 1596 unsigned long new_index, old_index;
> 1597 struct buffer_page *bpage;
> 1598 unsigned long index;
> 1599 unsigned long addr;
> 1600 int cpu;
> 1601
> 1602 /* The event is discarded regardless */
> 1603 ring_buffer_event_discard(event);
> 1604
> 1605 /*
> 1606 * This must only be called if the event has not been
> 1607 * committed yet. Thus we can assume that preemption
> 1608 * is still disabled.
> 1609 */
> 1610 RB_WARN_ON(buffer, !preempt_count());
As the comment says, did you do a discard outside the a commit. The way
this works is:
event = ring_buffer_lock_reserve(buffer, size); /* preempt disabled */
if (!event)
return;
entry = ring_buffer_event_data(event);
entry->stuff = my_stuff;
if (throw_out_entry)
ring_buffer_discard_commit(buffer, event);
else
ring_buffer_unlock_commit(buffer, event);
If you did not do the above, it will most likely be broken.
>
>
> ---
> kernel/trace/kmemtrace.c | 10 +++----
> kernel/trace/trace.c | 44 ++++++++++++++++++----------------
> kernel/trace/trace.h | 24 +++++++++++++++----
> kernel/trace/trace_branch.c | 5 +--
> kernel/trace/trace_events_filter.c | 15 +++++++----
> kernel/trace/trace_events_stage_3.h | 5 +---
> kernel/trace/trace_hw_branches.c | 4 +-
> kernel/trace/trace_power.c | 8 +++---
> 8 files changed, 64 insertions(+), 51 deletions(-)
>
> diff --git a/kernel/trace/kmemtrace.c b/kernel/trace/kmemtrace.c
> index 4a5f053..2c4c8fd 100644
> --- a/kernel/trace/kmemtrace.c
> +++ b/kernel/trace/kmemtrace.c
> @@ -290,9 +290,8 @@ void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
> entry->gfp_flags = gfp_flags;
> entry->node = node;
>
> - filter_check_discard(call, entry, event);
> -
> - trace_buffer_unlock_commit(tr, event, 0, 0);
> + if (!filter_check_discard(tr, call, entry, event))
> + trace_buffer_unlock_commit(tr, event, 0, 0);
> }
> EXPORT_SYMBOL(kmemtrace_mark_alloc_node);
>
> @@ -317,9 +316,8 @@ void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> entry->call_site = call_site;
> entry->ptr = ptr;
>
> - filter_check_discard(call, entry, event);
> -
> - trace_buffer_unlock_commit(tr, event, 0, 0);
> + if (!filter_check_discard(tr, call, entry, event))
> + trace_buffer_unlock_commit(tr, event, 0, 0);
> }
> EXPORT_SYMBOL(kmemtrace_mark_free);
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index d89489c..9347965 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -169,6 +169,11 @@ ns2usecs(cycle_t nsec)
> */
> static struct trace_array global_trace;
>
> +struct trace_array *get_global_trace(void)
> +{
> + return &global_trace;
> +}
> +
Oh, reminder, do not do the above. Have a function to do something like:
trace_current_buffer_lock_reserve()
> static DEFINE_PER_CPU(struct trace_array_cpu, global_trace_cpu);
>
> cycle_t ftrace_now(int cpu)
> @@ -928,9 +933,8 @@ trace_function(struct trace_array *tr,
> entry->ip = ip;
> entry->parent_ip = parent_ip;
>
> - filter_check_discard(call, entry, event);
I guess I should have look closer to your patch. I really do not want
filtering in the trace_function, but perhaps at the registering of it.
Note, trace_function is used by lots of tracers. It is not an event.
> -
> - ring_buffer_unlock_commit(tr->buffer, event);
> + if (!filter_check_discard(tr, call, entry, event))
> + ring_buffer_unlock_commit(tr->buffer, event);
> }
>
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> @@ -952,8 +956,8 @@ static int __trace_graph_entry(struct trace_array *tr,
> return 0;
> entry = ring_buffer_event_data(event);
> entry->graph_ent = *trace;
> - filter_check_discard(call, entry, event);
> - ring_buffer_unlock_commit(global_trace.buffer, event);
> + if (!filter_check_discard(&global_trace, call, entry, event))
> + ring_buffer_unlock_commit(global_trace.buffer, event);
Honestly, I think we do not want to do any filtering in any of the
function tracers until we take a closer look at it. These are high
through put code paths, which might also have some side effects. We need
to take a closer look at them.
>
> return 1;
> }
> @@ -976,8 +980,8 @@ static void __trace_graph_return(struct trace_array *tr,
> return;
> entry = ring_buffer_event_data(event);
> entry->ret = *trace;
> - filter_check_discard(call, entry, event);
> - ring_buffer_unlock_commit(global_trace.buffer, event);
> + if (!filter_check_discard(&global_trace, call, entry, event))
> + ring_buffer_unlock_commit(global_trace.buffer, event);
> }
> #endif
>
> @@ -1013,8 +1017,8 @@ static void __ftrace_trace_stack(struct trace_array *tr,
> trace.entries = entry->caller;
>
> save_stack_trace(&trace);
> - filter_check_discard(call, entry, event);
> - ring_buffer_unlock_commit(tr->buffer, event);
> + if (!filter_check_discard(tr, call, entry, event))
> + ring_buffer_unlock_commit(tr->buffer, event);
> #endif
> }
>
> @@ -1061,8 +1065,8 @@ static void ftrace_trace_userstack(struct trace_array *tr,
> trace.entries = entry->caller;
>
> save_stack_trace_user(&trace);
> - filter_check_discard(call, entry, event);
> - ring_buffer_unlock_commit(tr->buffer, event);
> + if (!filter_check_discard(tr, call, entry, event))
> + ring_buffer_unlock_commit(tr->buffer, event);
> #endif
> }
>
> @@ -1123,9 +1127,8 @@ tracing_sched_switch_trace(struct trace_array *tr,
> entry->next_state = next->state;
> entry->next_cpu = task_cpu(next);
>
> - filter_check_discard(call, entry, event);
> -
> - trace_buffer_unlock_commit(tr, event, flags, pc);
> + if (!filter_check_discard(tr, call, entry, event))
> + trace_buffer_unlock_commit(tr, event, flags, pc);
> }
>
> void
> @@ -1151,9 +1154,8 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
> entry->next_state = wakee->state;
> entry->next_cpu = task_cpu(wakee);
>
> - filter_check_discard(call, entry, event);
> -
> - ring_buffer_unlock_commit(tr->buffer, event);
> + if (!filter_check_discard(tr, call, entry, event))
> + ring_buffer_unlock_commit(tr->buffer, event);
> ftrace_trace_stack(tr, flags, 6, pc);
> ftrace_trace_userstack(tr, flags, pc);
> }
> @@ -1294,8 +1296,8 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
> entry->fmt = fmt;
>
> memcpy(entry->buf, trace_buf, sizeof(u32) * len);
> - filter_check_discard(call, entry, event);
> - ring_buffer_unlock_commit(tr->buffer, event);
> + if (!filter_check_discard(tr, call, entry, event))
> + ring_buffer_unlock_commit(tr->buffer, event);
>
> out_unlock:
> __raw_spin_unlock(&trace_buf_lock);
> @@ -1350,8 +1352,8 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
>
> memcpy(&entry->buf, trace_buf, len);
> entry->buf[len] = 0;
> - filter_check_discard(call, entry, event);
> - ring_buffer_unlock_commit(tr->buffer, event);
> + if (!filter_check_discard(tr, call, entry, event))
> + ring_buffer_unlock_commit(tr->buffer, event);
>
> out_unlock:
> __raw_spin_unlock(&trace_buf_lock);
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 2f16866..0c79a49 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -848,17 +848,29 @@ extern int filter_parse(char **pbuf, struct filter_pred *pred);
> extern int filter_add_pred(struct ftrace_event_call *call,
> struct filter_pred *pred);
> extern void filter_free_preds(struct ftrace_event_call *call);
> -extern int filter_match_preds(struct ftrace_event_call *call, void *rec);
> +extern int filter_match_preds(struct filter_pred **preds, void *rec);
> extern void filter_free_subsystem_preds(struct event_subsystem *system);
> extern int filter_add_subsystem_pred(struct event_subsystem *system,
> struct filter_pred *pred);
>
> -static inline void
> -filter_check_discard(struct ftrace_event_call *call, void *rec,
> +static inline int
> +filter_check_discard(struct trace_array *trace,
> + struct ftrace_event_call *call,
> + void *rec,
> struct ring_buffer_event *event)
> {
> - if (unlikely(call->preds) && !filter_match_preds(call, rec))
> - ring_buffer_event_discard(event);
> + struct filter_pred **preds;
> + int discarded = 0;
> +
> + rcu_read_lock();
Ug, do you know the function tracers do trace rcu_read_lock.
Hmm, since we must be here with preemption disabled, you can add your own
WARN_ON(!preempt_count()) too. Remove the rcu_read_locks and just use
synchronize_sched() for the rcu work.
> + preds = rcu_dereference(call->preds);
> + if (unlikely(preds) && !filter_match_preds(preds, rec)) {
> + ring_buffer_discard_commit(trace->buffer, event);
> + discarded = 1;
> + }
> + rcu_read_unlock();
> +
> + return discarded;
> }
>
> #define __common_field(type, item) \
> @@ -899,6 +911,8 @@ do { \
> __trace_printk(ip, fmt, ##args); \
> } while (0)
>
> +extern struct trace_array *get_global_trace(void);
> +
> #undef TRACE_EVENT_FORMAT
> #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
> extern struct ftrace_event_call event_##call;
> diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
> index 25e618a..a36e4d5 100644
> --- a/kernel/trace/trace_branch.c
> +++ b/kernel/trace/trace_branch.c
> @@ -74,9 +74,8 @@ probe_likely_condition(struct ftrace_branch_data *f, int val, int expect)
> entry->line = f->line;
> entry->correct = val == expect;
>
> - filter_check_discard(call, entry, event);
> -
> - ring_buffer_unlock_commit(tr->buffer, event);
> + if (!filter_check_discard(tr, call, entry, event))
> + ring_buffer_unlock_commit(tr->buffer, event);
>
> out:
> atomic_dec(&tr->data[cpu]->disabled);
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index 470ad94..f7f8469 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -83,14 +83,14 @@ static int filter_pred_string(struct filter_pred *pred, void *event)
> }
>
> /* return 1 if event matches, 0 otherwise (discard) */
> -int filter_match_preds(struct ftrace_event_call *call, void *rec)
> +int filter_match_preds(struct filter_pred **preds, void *rec)
> {
> int i, matched, and_failed = 0;
> struct filter_pred *pred;
>
> for (i = 0; i < MAX_FILTER_PRED; i++) {
> - if (call->preds[i]) {
> - pred = call->preds[i];
> + if (preds[i]) {
> + pred = preds[i];
> if (and_failed && !pred->or)
> continue;
> matched = pred->fn(pred, rec);
> @@ -162,13 +162,16 @@ void filter_free_pred(struct filter_pred *pred)
>
> void filter_free_preds(struct ftrace_event_call *call)
> {
> + struct filter_pred **preds;
> int i;
>
> if (call->preds) {
> + preds = call->preds;
> + rcu_assign_pointer(call->preds, NULL);
> + synchronize_rcu();
synchronize_sched() here instead.
> for (i = 0; i < MAX_FILTER_PRED; i++)
> - filter_free_pred(call->preds[i]);
> - kfree(call->preds);
> - call->preds = NULL;
> + filter_free_pred(preds[i]);
> + kfree(preds);
> }
> }
>
> diff --git a/kernel/trace/trace_events_stage_3.h b/kernel/trace/trace_events_stage_3.h
> index d2f34bf..97a9e1e 100644
> --- a/kernel/trace/trace_events_stage_3.h
> +++ b/kernel/trace/trace_events_stage_3.h
> @@ -222,11 +222,8 @@ static void ftrace_raw_event_##call(proto) \
> \
> assign; \
> \
> - if (call->preds && !filter_match_preds(call, entry)) \
> - trace_current_buffer_discard_commit(event); \
> - else \
> + if (!filter_check_discard(get_global_trace(), call, entry, event)) \
> trace_nowake_buffer_unlock_commit(event, irq_flags, pc); \
> - \
> } \
> \
> static int ftrace_raw_reg_event_##call(void) \
> diff --git a/kernel/trace/trace_hw_branches.c b/kernel/trace/trace_hw_branches.c
> index e6b275b..7ade2f8 100644
> --- a/kernel/trace/trace_hw_branches.c
> +++ b/kernel/trace/trace_hw_branches.c
> @@ -195,8 +195,8 @@ void trace_hw_branch(u64 from, u64 to)
> entry->ent.type = TRACE_HW_BRANCHES;
> entry->from = from;
> entry->to = to;
> - filter_check_discard(call, entry, event);
> - trace_buffer_unlock_commit(tr, event, 0, 0);
> + if (!filter_check_discard(tr, call, entry, event))
> + trace_buffer_unlock_commit(tr, event, 0, 0);
>
> out:
> atomic_dec(&tr->data[cpu]->disabled);
> diff --git a/kernel/trace/trace_power.c b/kernel/trace/trace_power.c
> index 8ce7d7d..ed506a1 100644
> --- a/kernel/trace/trace_power.c
> +++ b/kernel/trace/trace_power.c
> @@ -55,8 +55,8 @@ static void probe_power_end(struct power_trace *it)
> goto out;
> entry = ring_buffer_event_data(event);
> entry->state_data = *it;
> - filter_check_discard(call, entry, event);
> - trace_buffer_unlock_commit(tr, event, 0, 0);
> + if (!filter_check_discard(tr, call, entry, event))
> + trace_buffer_unlock_commit(tr, event, 0, 0);
> out:
> preempt_enable();
> }
> @@ -87,8 +87,8 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
> goto out;
> entry = ring_buffer_event_data(event);
> entry->state_data = *it;
> - filter_check_discard(call, entry, event);
> - trace_buffer_unlock_commit(tr, event, 0, 0);
> + if (!filter_check_discard(tr, call, entry, event))
> + trace_buffer_unlock_commit(tr, event, 0, 0);
> out:
> preempt_enable();
> }
> --
The bug does not stick out in this patch set. Perhaps it is part of the
original too? But something somewhere is calling the discard outside the
reserve and commit.
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events
2009-04-02 15:01 ` Steven Rostedt
@ 2009-04-03 11:51 ` Ingo Molnar
2009-04-04 1:11 ` Steven Rostedt
2009-04-07 5:46 ` Tom Zanussi
1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2009-04-03 11:51 UTC (permalink / raw)
To: Steven Rostedt
Cc: Tom Zanussi, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Steven Rostedt
* Steven Rostedt <rostedt@goodmis.org> wrote:
> > [ 240.462396] [<ffffffff802c843b>] trace_function+0xab/0xc0
>
> Oo, you have trace filters in trace_function. I'm not sure we want
> that. If we do, please make it a separate function register. That
> is, the function tracer is such a extreme hot path, that I would
> like to avoid as many branch conditionals as possible. With the
> dynamic function tracing, we can pick functions that do conditions
> when we want them, and we can pick those that do not. If you look
> at some of the register_ftrace_function callers, you will see
> where I've picked different functions to be called by the tracer
> depending on what the options are.
Actually, i like Tom's idea very much: as it allows pretty flexible
context based filtering - not just function based filtering.
So we could filter for:
- a specific PID
- or a pattern of ->comm strings ['bash' or 'sshd' or 'hackbench']
- or we could filter for non-zero preempt-counts
(i.e. critical sections only).
So this extends the function symbol based regexp mechanism in a
quite natural way, and it would be sad to not do this just because
it's ... arguably hard to implement robustly ;-)
The filter expression predicaments are pre-constructed and static at
the point of execution, so they rely on no external facility other
than some internal C code. So it should be possible to do this.
And if the tracepoint causes runtime overhead .. we need to optimize
that. We could register a different, filter-aware mcount callback
depending on whether there's any filter defined there.
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] [GIT PULL] for tip/tracing/filters
2009-04-02 5:27 [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
` (5 preceding siblings ...)
2009-04-02 5:37 ` [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
@ 2009-04-03 12:14 ` Ingo Molnar
2009-04-03 14:36 ` Steven Rostedt
6 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2009-04-03 12:14 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker
* Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Ingo,
>
> Please pull the latest tip/tracing/filters tree, which can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/filters
>
>
> Steven Rostedt (2):
> ring-buffer: add ring_buffer_discard_commit
> tracing/filters: use ring_buffer_discard_commit for discarded events
>
> Tom Zanussi (2):
> tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events
> tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro
I've cherry-picked them into tip/tracing/filters, and there are some
new build failures:
(.text+0x40795): undefined reference to `event_context_switch'
kernel/built-in.o: In function `probe_power_mark':
trace_power.c:(.text+0x44fd7): undefined reference to `event_power'
trace_power.c:(.text+0x44fe1): undefined reference to `event_power'
trace_power.c:(.text+0x44fe6): undefined reference to
`filter_match_preds'
kernel/built-in.o: In function `probe_power_end':
trace_power.c:(.text+0x45064): undefined reference to `event_power'
trace_power.c:(.text+0x4506e): undefined reference to `event_power'
trace_power.c:(.text+0x45073): undefined reference to
`filter_match_preds'
kernel/built-in.o: In function `kmemtrace_free':
so i've excluded it from tip/master for now.
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] [GIT PULL] for tip/tracing/filters
2009-04-03 12:14 ` Ingo Molnar
@ 2009-04-03 14:36 ` Steven Rostedt
2009-04-04 6:50 ` Tom Zanussi
0 siblings, 1 reply; 22+ messages in thread
From: Steven Rostedt @ 2009-04-03 14:36 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Tom Zanussi, Frederic Weisbecker
On Fri, 3 Apr 2009, Ingo Molnar wrote:
>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> >
> > Ingo,
> >
> > Please pull the latest tip/tracing/filters tree, which can be found at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> > tip/tracing/filters
> >
> >
> > Steven Rostedt (2):
> > ring-buffer: add ring_buffer_discard_commit
> > tracing/filters: use ring_buffer_discard_commit for discarded events
> >
> > Tom Zanussi (2):
> > tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events
> > tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro
>
> I've cherry-picked them into tip/tracing/filters, and there are some
> new build failures:
>
> (.text+0x40795): undefined reference to `event_context_switch'
> kernel/built-in.o: In function `probe_power_mark':
> trace_power.c:(.text+0x44fd7): undefined reference to `event_power'
> trace_power.c:(.text+0x44fe1): undefined reference to `event_power'
> trace_power.c:(.text+0x44fe6): undefined reference to
> `filter_match_preds'
> kernel/built-in.o: In function `probe_power_end':
> trace_power.c:(.text+0x45064): undefined reference to `event_power'
> trace_power.c:(.text+0x4506e): undefined reference to `event_power'
> trace_power.c:(.text+0x45073): undefined reference to
> `filter_match_preds'
> kernel/built-in.o: In function `kmemtrace_free':
>
> so i've excluded it from tip/master for now.
I figured that is due to the missing pieces from the ftrace merge I did.
But I guess we want to tone down on the inter merging of branches.
Tom, could you check out the tracing/filters branch from tip and get that
branch working.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events
2009-04-03 11:51 ` Ingo Molnar
@ 2009-04-04 1:11 ` Steven Rostedt
0 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2009-04-04 1:11 UTC (permalink / raw)
To: Ingo Molnar
Cc: Tom Zanussi, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Steven Rostedt
On Fri, 3 Apr 2009, Ingo Molnar wrote:
>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > > [ 240.462396] [<ffffffff802c843b>] trace_function+0xab/0xc0
> >
> > Oo, you have trace filters in trace_function. I'm not sure we want
> > that. If we do, please make it a separate function register. That
> > is, the function tracer is such a extreme hot path, that I would
> > like to avoid as many branch conditionals as possible. With the
> > dynamic function tracing, we can pick functions that do conditions
> > when we want them, and we can pick those that do not. If you look
> > at some of the register_ftrace_function callers, you will see
> > where I've picked different functions to be called by the tracer
> > depending on what the options are.
>
> Actually, i like Tom's idea very much: as it allows pretty flexible
> context based filtering - not just function based filtering.
>
> So we could filter for:
>
> - a specific PID
>
> - or a pattern of ->comm strings ['bash' or 'sshd' or 'hackbench']
>
> - or we could filter for non-zero preempt-counts
> (i.e. critical sections only).
>
> So this extends the function symbol based regexp mechanism in a
> quite natural way, and it would be sad to not do this just because
> it's ... arguably hard to implement robustly ;-)
>
> The filter expression predicaments are pre-constructed and static at
> the point of execution, so they rely on no external facility other
> than some internal C code. So it should be possible to do this.
>
> And if the tracepoint causes runtime overhead .. we need to optimize
> that. We could register a different, filter-aware mcount callback
> depending on whether there's any filter defined there.
This is exactly what I was talking about in my ramblings up above ;-)
That is, if filtering is not on, we can just enable the non filtered
version of ftrace. As soon as the function tracer filtering is modified,
we switch to a "filter-aware" version of the function tracer.
I'll look at modifying it do to something like this.
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] [GIT PULL] for tip/tracing/filters
2009-04-03 14:36 ` Steven Rostedt
@ 2009-04-04 6:50 ` Tom Zanussi
2009-04-04 15:42 ` Steven Rostedt
0 siblings, 1 reply; 22+ messages in thread
From: Tom Zanussi @ 2009-04-04 6:50 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker
On Fri, 2009-04-03 at 10:36 -0400, Steven Rostedt wrote:
> On Fri, 3 Apr 2009, Ingo Molnar wrote:
>
> >
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > >
> > > Ingo,
> > >
> > > Please pull the latest tip/tracing/filters tree, which can be found at:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> > > tip/tracing/filters
> > >
> > >
> > > Steven Rostedt (2):
> > > ring-buffer: add ring_buffer_discard_commit
> > > tracing/filters: use ring_buffer_discard_commit for discarded events
> > >
> > > Tom Zanussi (2):
> > > tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events
> > > tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro
> >
> > I've cherry-picked them into tip/tracing/filters, and there are some
> > new build failures:
> >
> > (.text+0x40795): undefined reference to `event_context_switch'
> > kernel/built-in.o: In function `probe_power_mark':
> > trace_power.c:(.text+0x44fd7): undefined reference to `event_power'
> > trace_power.c:(.text+0x44fe1): undefined reference to `event_power'
> > trace_power.c:(.text+0x44fe6): undefined reference to
> > `filter_match_preds'
> > kernel/built-in.o: In function `probe_power_end':
> > trace_power.c:(.text+0x45064): undefined reference to `event_power'
> > trace_power.c:(.text+0x4506e): undefined reference to `event_power'
> > trace_power.c:(.text+0x45073): undefined reference to
> > `filter_match_preds'
> > kernel/built-in.o: In function `kmemtrace_free':
> >
> > so i've excluded it from tip/master for now.
>
> I figured that is due to the missing pieces from the ftrace merge I did.
> But I guess we want to tone down on the inter merging of branches.
>
> Tom, could you check out the tracing/filters branch from tip and get that
> branch working.
>
I'm guessing the problem is that CONFIG_EVENT_TRACER wasn't selected but
should have been, because those tracers all now export events
(trace_event_types.h, included by trace_export.c). I'm not sure this is
the right way to do it, but the patch below selects EVENT_TRACER if
TRACING is selected (a couple of the exported events belong to bprint
and print, which don't have a config option other than TRACING, so it
seems to belong there).
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
---
kernel/trace/Kconfig | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8a41360..a061746 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -56,6 +56,7 @@ config TRACING
select TRACEPOINTS
select NOP_TRACER
select BINARY_PRINTF
+ select EVENT_TRACER
#
# Minimum requirements an architecture has to meet for us to
@@ -177,7 +178,7 @@ config CONTEXT_SWITCH_TRACER
config EVENT_TRACER
bool "Trace various events in the kernel"
- select TRACING
+ depends on TRACING
help
This tracer hooks to various trace points in the kernel
allowing the user to pick and choose which trace point they
--
1.5.6.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] [GIT PULL] for tip/tracing/filters
2009-04-04 6:50 ` Tom Zanussi
@ 2009-04-04 15:42 ` Steven Rostedt
2009-04-05 7:52 ` Tom Zanussi
2009-04-05 14:47 ` Ingo Molnar
0 siblings, 2 replies; 22+ messages in thread
From: Steven Rostedt @ 2009-04-04 15:42 UTC (permalink / raw)
To: Tom Zanussi
Cc: Ingo Molnar, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker
On Sat, 4 Apr 2009, Tom Zanussi wrote:
> On Fri, 2009-04-03 at 10:36 -0400, Steven Rostedt wrote:
>
> I'm guessing the problem is that CONFIG_EVENT_TRACER wasn't selected but
> should have been, because those tracers all now export events
> (trace_event_types.h, included by trace_export.c). I'm not sure this is
> the right way to do it, but the patch below selects EVENT_TRACER if
> TRACING is selected (a couple of the exported events belong to bprint
> and print, which don't have a config option other than TRACING, so it
> seems to belong there).
>
> Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
>
> ---
> kernel/trace/Kconfig | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 8a41360..a061746 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -56,6 +56,7 @@ config TRACING
> select TRACEPOINTS
> select NOP_TRACER
> select BINARY_PRINTF
> + select EVENT_TRACER
Yeah, we can do this.
>
> #
> # Minimum requirements an architecture has to meet for us to
> @@ -177,7 +178,7 @@ config CONTEXT_SWITCH_TRACER
>
> config EVENT_TRACER
> bool "Trace various events in the kernel"
> - select TRACING
> + depends on TRACING
This means we can only select EVENT_TRACER if something else has been
selected. Can't we keep the "select TRACING" here.
That way, we could just select EVENT_TRACER, but if anything other
tracer is selected, we automatically get EVENT_TRACER.
The logic would be, if you pick any tracer you get the EVENT_TRACER.
But you can still just ask for the EVENT_TRACER with no other tracer.
-- Steve
> help
> This tracer hooks to various trace points in the kernel
> allowing the user to pick and choose which trace point they
> --
> 1.5.6.3
>
>
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] [GIT PULL] for tip/tracing/filters
2009-04-04 15:42 ` Steven Rostedt
@ 2009-04-05 7:52 ` Tom Zanussi
2009-04-05 14:47 ` Ingo Molnar
1 sibling, 0 replies; 22+ messages in thread
From: Tom Zanussi @ 2009-04-05 7:52 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker
On Sat, 2009-04-04 at 11:42 -0400, Steven Rostedt wrote:
> On Sat, 4 Apr 2009, Tom Zanussi wrote:
> > On Fri, 2009-04-03 at 10:36 -0400, Steven Rostedt wrote:
> >
> > I'm guessing the problem is that CONFIG_EVENT_TRACER wasn't selected but
> > should have been, because those tracers all now export events
> > (trace_event_types.h, included by trace_export.c). I'm not sure this is
> > the right way to do it, but the patch below selects EVENT_TRACER if
> > TRACING is selected (a couple of the exported events belong to bprint
> > and print, which don't have a config option other than TRACING, so it
> > seems to belong there).
> >
> > Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
> >
> > ---
> > kernel/trace/Kconfig | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 8a41360..a061746 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -56,6 +56,7 @@ config TRACING
> > select TRACEPOINTS
> > select NOP_TRACER
> > select BINARY_PRINTF
> > + select EVENT_TRACER
>
> Yeah, we can do this.
>
> >
> > #
> > # Minimum requirements an architecture has to meet for us to
> > @@ -177,7 +178,7 @@ config CONTEXT_SWITCH_TRACER
> >
> > config EVENT_TRACER
> > bool "Trace various events in the kernel"
> > - select TRACING
> > + depends on TRACING
>
> This means we can only select EVENT_TRACER if something else has been
> selected. Can't we keep the "select TRACING" here.
>
Keeping 'select TRACING' here gives this error:
kernel/trace/Kconfig:53:error: found recursive dependency: TRACING ->
EVENT_TRACER -> TRACING
I'll have to play around with it some more...
Tom
> That way, we could just select EVENT_TRACER, but if anything other
> tracer is selected, we automatically get EVENT_TRACER.
>
> The logic would be, if you pick any tracer you get the EVENT_TRACER.
> But you can still just ask for the EVENT_TRACER with no other tracer.
> -- Steve
>
>
>
> > help
> > This tracer hooks to various trace points in the kernel
> > allowing the user to pick and choose which trace point they
> > --
> > 1.5.6.3
> >
> >
> >
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] [GIT PULL] for tip/tracing/filters
2009-04-04 15:42 ` Steven Rostedt
2009-04-05 7:52 ` Tom Zanussi
@ 2009-04-05 14:47 ` Ingo Molnar
1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2009-04-05 14:47 UTC (permalink / raw)
To: Steven Rostedt
Cc: Tom Zanussi, linux-kernel, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker
* Steven Rostedt <rostedt@goodmis.org> wrote:
> On Sat, 4 Apr 2009, Tom Zanussi wrote:
> > On Fri, 2009-04-03 at 10:36 -0400, Steven Rostedt wrote:
> >
> > I'm guessing the problem is that CONFIG_EVENT_TRACER wasn't selected but
> > should have been, because those tracers all now export events
> > (trace_event_types.h, included by trace_export.c). I'm not sure this is
> > the right way to do it, but the patch below selects EVENT_TRACER if
> > TRACING is selected (a couple of the exported events belong to bprint
> > and print, which don't have a config option other than TRACING, so it
> > seems to belong there).
> >
> > Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
> >
> > ---
> > kernel/trace/Kconfig | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 8a41360..a061746 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -56,6 +56,7 @@ config TRACING
> > select TRACEPOINTS
> > select NOP_TRACER
> > select BINARY_PRINTF
> > + select EVENT_TRACER
>
> Yeah, we can do this.
Yeah - it has become more of a core facility thing, not a separate
plugin.
That's the long-term healthy life-cycle for certain tracing
functionality anyway: if it becomes so useful that all tracers want
to make use of it, it becomes part of the standard infrastructure.
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events
2009-04-02 15:01 ` Steven Rostedt
2009-04-03 11:51 ` Ingo Molnar
@ 2009-04-07 5:46 ` Tom Zanussi
2009-04-07 9:24 ` Steven Rostedt
1 sibling, 1 reply; 22+ messages in thread
From: Tom Zanussi @ 2009-04-07 5:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Steven Rostedt
On Thu, 2009-04-02 at 11:01 -0400, Steven Rostedt wrote:
> On Thu, 2 Apr 2009, Tom Zanussi wrote:
>
> > Hi Steve,
> >
> > Great to see this! I updated my filter removal rcu patch, made some
> > changes to the filter_check_discard() to work with the new
> > ring_buffer_discard_commit() and made the necessary changes to the
> > ftrace tracers as well - see the patch below, which has only been
> > touch-tested at this point. It seemed to work at first, but then
> > produced an oops, which may well be a problem related to the changes I
> > made. I'll look into it more tomorrow night, and will also fix up this
> > patch and repost it if it basically looks ok. I'll post the oops and
> > the lines of code that it refers to just in case it it rings a bell...
>
> Yeah, it looks like it is related to changes that you made ;-)
>
> >
> > [ 240.461982] ------------[ cut here ]------------
> > [ 240.461993] WARNING: at kernel/trace/ring_buffer.c:1610
> > ring_buffer_discard_commit+0xfa/0x100()
>
> > [ 240.462259] Pid: 6143, comm: bash Not tainted 2.6.29-tip #35
> > [ 240.462267] Call Trace:
> > [ 240.462280] [<ffffffff8025c5c8>] warn_slowpath+0xd8/0x130
> > [ 240.462291] [<ffffffff802c2be5>] ? rb_reserve_next_event+0x45/0x360
> > [ 240.462304] [<ffffffff802c6fc1>] ? trace_buffer_lock_reserve
> > +0x51/0x70
> > [ 240.462316] [<ffffffff802c12ea>] ? ring_buffer_unlock_commit
> > +0x5a/0x60
> > [ 240.462329] [<ffffffff802c2be5>] ? rb_reserve_next_event+0x45/0x360
> > [ 240.462340] [<ffffffff802c30cb>] ? ring_buffer_lock_reserve
> > +0x9b/0xe0
> > [ 240.462354] [<ffffffff8035f42b>] ? dnotify_parent+0xb/0xc0
> > [ 240.462365] [<ffffffff8032af65>] ? vfs_write+0x155/0x1d0
> > [ 240.462375] [<ffffffff802c1e2a>] ring_buffer_discard_commit
> > +0xfa/0x100
> > [ 240.462386] [<ffffffff8035f42b>] ? dnotify_parent+0xb/0xc0
> > [ 240.462396] [<ffffffff802c843b>] trace_function+0xab/0xc0
[...]
> The bug does not stick out in this patch set. Perhaps it is part of the
> original too? But something somewhere is calling the discard outside the
> reserve and commit.
>
It doesn't stick out to me either - the funny thing is that it only
happens with CONFIG_PREEMPT_VOLUNTARY - with CONFIG_PREEMPT it's fine.
In fact, with CONFIG_PREEMPT_VOLUNTARY, an RB_WARN_ON(!preempt_count())
right after frace_preempt_disable() triggers immediately, which unless
I'm missing something, should never happen. Is there a bug in
PREEMPT_VOLUNTARY?
Tom
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7a6209f..bac9ab7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1494,6 +1494,8 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
/* If we are tracing schedule, we don't want to recurse */
resched = ftrace_preempt_disable();
+ RB_WARN_ON(buffer, !preempt_count());
+
cpu = raw_smp_processor_id();
if (!cpumask_test_cpu(cpu, buffer->cpumask))
> -- Steve
>
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events
2009-04-07 5:46 ` Tom Zanussi
@ 2009-04-07 9:24 ` Steven Rostedt
0 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2009-04-07 9:24 UTC (permalink / raw)
To: Tom Zanussi
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Frederic Weisbecker, Steven Rostedt
On Tue, 7 Apr 2009, Tom Zanussi wrote:
>
> > The bug does not stick out in this patch set. Perhaps it is part of the
> > original too? But something somewhere is calling the discard outside the
> > reserve and commit.
> >
>
> It doesn't stick out to me either - the funny thing is that it only
> happens with CONFIG_PREEMPT_VOLUNTARY - with CONFIG_PREEMPT it's fine.
>
> In fact, with CONFIG_PREEMPT_VOLUNTARY, an RB_WARN_ON(!preempt_count())
> right after frace_preempt_disable() triggers immediately, which unless
> I'm missing something, should never happen. Is there a bug in
> PREEMPT_VOLUNTARY?
>
> Tom
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 7a6209f..bac9ab7 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1494,6 +1494,8 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
> /* If we are tracing schedule, we don't want to recurse */
> resched = ftrace_preempt_disable();
>
> + RB_WARN_ON(buffer, !preempt_count());
Ug, I'm an idiot ;-)
PREEMPT_VOLUNTARY will always have preempt_count return zero. We need a:
RB_WARN_ON_PREEMPT(buffer);
macro, so that we can define it to warn on CONFIG_PREEMPT but have it be a
nop for all else.
Note, I'm currently traveling so my response times will be limited.
-- Steve
> +
> cpu = raw_smp_processor_id();
>
> if (!cpumask_test_cpu(cpu, buffer->cpumask))
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2009-04-07 9:24 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-02 5:27 [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
2009-04-02 5:27 ` [PATCH 1/4] tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events Steven Rostedt
2009-04-02 12:17 ` Frederic Weisbecker
2009-04-02 5:27 ` Steven Rostedt
2009-04-02 5:27 ` [PATCH 2/4] tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro Steven Rostedt
2009-04-02 5:27 ` [PATCH 3/4] ring-buffer: add ring_buffer_discard_commit Steven Rostedt
2009-04-02 5:48 ` Andrew Morton
2009-04-02 13:08 ` Steven Rostedt
2009-04-02 5:27 ` [PATCH 4/4] tracing/filters: use ring_buffer_discard_commit for discarded events Steven Rostedt
2009-04-02 9:06 ` Tom Zanussi
2009-04-02 15:01 ` Steven Rostedt
2009-04-03 11:51 ` Ingo Molnar
2009-04-04 1:11 ` Steven Rostedt
2009-04-07 5:46 ` Tom Zanussi
2009-04-07 9:24 ` Steven Rostedt
2009-04-02 5:37 ` [PATCH 0/4] [GIT PULL] for tip/tracing/filters Steven Rostedt
2009-04-03 12:14 ` Ingo Molnar
2009-04-03 14:36 ` Steven Rostedt
2009-04-04 6:50 ` Tom Zanussi
2009-04-04 15:42 ` Steven Rostedt
2009-04-05 7:52 ` Tom Zanussi
2009-04-05 14:47 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox