* [PATCH] perfcounters: Support for ftrace event records sampling
@ 2009-08-06 23:25 Frederic Weisbecker
2009-08-07 7:33 ` [tip:perfcounters/tracing] perf_counter: " tip-bot for Frederic Weisbecker
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2009-08-06 23:25 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Frederic Weisbecker, Arnaldo Carvalho de Melo,
Peter Zijlstra, Mike Galbraith, Steven Rostedt, Paul Mackerras,
Pekka Enberg, Gabriel Munteanu, Li Zefan, Lai Jiangshan
This patch brings the kernel side support for ftrace event record
sampling.
A new counter attribute is added: PERF_SAMPLE_TP_RECORD which requests
ftrace events record sampling.
For now, the events record are always copied from the event profiling
callback and passed down to perfcounter. Perfcounter will just zap
the record copy if the attribute is not set.
Result, after forcing PERF_SAMPLE_TP_RECORD attribute from perf record:
perf record -f -F 1 -a -e workqueue:workqueue_execution
perf report -D
0x21e18 [0x48]: event: 9
.
. ... raw event: size 72 bytes
. 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........
. 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!......
. 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve
. 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1...........
. 0040: e0 b1 31 81 ff ff ff ff .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
The ftrace record starts at offset 0020.
Translation:
struct trace_entry {
type = 0x2b = 43;
flags = 1;
preempt_count = 2;
pid = 0xa = 10;
tgid = 0xa = 10;
}
thread_comm = "events/1"
thread_pid = 0xa = 10;
func = 0xffffffff8131b1e0 = flush_to_ldisc()
What will come next?
- The unconditional copy from the profiling callback brings some costs
however, and may require to be fixed in the future. For that we need
to have an instant access to the perf counter attribute. This is a
matter of a flag to add in the struct ftrace_event.
- Be care of the events recursivity! Don't ever try to record a lock event
for example, it seems some locking is used in the profiling fast path and
lead to a tracing recursivity. That will be fixed using raw spinlock or
recursivity protection.
- Userspace support
- [...]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
---
include/linux/ftrace_event.h | 4 +-
include/linux/perf_counter.h | 9 +++-
include/trace/ftrace.h | 130 +++++++++++++++++++++++++++++++-----------
kernel/perf_counter.c | 18 ++++++-
kernel/trace/trace.h | 4 -
tools/perf/builtin-record.c | 1 +
6 files changed, 125 insertions(+), 41 deletions(-)
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c04e6d9..ac8c6f8 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -89,7 +89,9 @@ enum print_line_t {
TRACE_TYPE_NO_CONSUME = 3 /* Handled but ask to not consume */
};
-
+void tracing_generic_entry_update(struct trace_entry *entry,
+ unsigned long flags,
+ int pc);
struct ring_buffer_event *
trace_current_buffer_lock_reserve(int type, unsigned long len,
unsigned long flags, int pc);
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index e604e6e..a67dd5c 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -121,8 +121,9 @@ enum perf_counter_sample_format {
PERF_SAMPLE_CPU = 1U << 7,
PERF_SAMPLE_PERIOD = 1U << 8,
PERF_SAMPLE_STREAM_ID = 1U << 9,
+ PERF_SAMPLE_TP_RECORD = 1U << 10,
- PERF_SAMPLE_MAX = 1U << 10, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 11, /* non-ABI */
};
/*
@@ -413,6 +414,11 @@ struct perf_callchain_entry {
__u64 ip[PERF_MAX_STACK_DEPTH];
};
+struct perf_tracepoint_record {
+ int size;
+ char *record;
+};
+
struct task_struct;
/**
@@ -681,6 +687,7 @@ struct perf_sample_data {
struct pt_regs *regs;
u64 addr;
u64 period;
+ void *private;
};
extern int perf_counter_overflow(struct perf_counter *counter, int nmi,
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 380b603..80e5f6c 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -355,15 +355,7 @@ static inline int ftrace_get_offsets_##call( \
/*
* Generate the functions needed for tracepoint perf_counter support.
*
- * static void ftrace_profile_<call>(proto)
- * {
- * extern void perf_tpcounter_event(int, u64, u64);
- * u64 __addr = 0, __count = 1;
- *
- * <assign> <-- here we expand the TP_perf_assign() macro
- *
- * perf_tpcounter_event(event_<call>.id, __addr, __count);
- * }
+ * NOTE: The insertion profile callback (ftrace_profile_<call>) is defined later
*
* static int ftrace_profile_enable_<call>(struct ftrace_event_call *event_call)
* {
@@ -383,28 +375,10 @@ static inline int ftrace_get_offsets_##call( \
*
*/
-#undef TP_fast_assign
-#define TP_fast_assign(args...)
-
-#undef TP_perf_assign
-#define TP_perf_assign(args...) args
-
-#undef __perf_addr
-#define __perf_addr(a) __addr = (a)
-
-#undef __perf_count
-#define __perf_count(c) __count = (c)
-
#undef TRACE_EVENT
#define TRACE_EVENT(call, proto, args, tstruct, assign, print) \
\
-static void ftrace_profile_##call(proto) \
-{ \
- extern void perf_tpcounter_event(int, u64, u64); \
- u64 __addr = 0, __count = 1; \
- { assign; } \
- perf_tpcounter_event(event_##call.id, __addr, __count); \
-} \
+static void ftrace_profile_##call(proto); \
\
static int ftrace_profile_enable_##call(struct ftrace_event_call *event_call) \
{ \
@@ -424,12 +398,6 @@ static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
-#undef TP_fast_assign
-#define TP_fast_assign(args...) args
-
-#undef TP_perf_assign
-#define TP_perf_assign(args...)
-
#endif
/*
@@ -649,5 +617,99 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+/*
+ * Define the insertion callback to profile events
+ *
+ * The job is very similar to ftrace_raw_event_<call> except that we don't
+ * insert in the ring buffer but in a perf counter.
+ *
+ * static void ftrace_profile_<call>(proto)
+ * {
+ * struct ftrace_data_offsets_<call> __maybe_unused __data_offsets;
+ * struct ftrace_event_call *event_call = &event_<call>;
+ * extern void perf_tpcounter_event(int, u64, u64, void *, int);
+ * struct ftrace_raw_##call *entry;
+ * u64 __addr = 0, __count = 1;
+ * unsigned long irq_flags;
+ * int __entry_size;
+ * int __data_size;
+ * int pc;
+ *
+ * local_save_flags(irq_flags);
+ * pc = preempt_count();
+ *
+ * __data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
+ * __entry_size = __data_size + sizeof(*entry);
+ *
+ * do {
+ * char raw_data[__entry_size]; <- allocate our sample in the stack
+ * struct trace_entry *ent;
+ *
+ * entry = (struct ftrace_raw_<call> *)raw_data;
+ * ent = &entry->ent;
+ * tracing_generic_entry_update(ent, irq_flags, pc);
+ * ent->type = event_call->id;
+ *
+ * <tstruct> <- do some jobs with dynamic arrays
+ *
+ * <assign> <- affect our values
+ *
+ * perf_tpcounter_event(event_call->id, __addr, __count, entry,
+ * __entry_size); <- submit them to perf counter
+ * } while (0);
+ *
+ * }
+ */
+
+#ifdef CONFIG_EVENT_PROFILE
+
+#undef __perf_addr
+#define __perf_addr(a) __addr = (a)
+
+#undef __perf_count
+#define __perf_count(c) __count = (c)
+
+#undef TRACE_EVENT
+#define TRACE_EVENT(call, proto, args, tstruct, assign, print) \
+static void ftrace_profile_##call(proto) \
+{ \
+ struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
+ struct ftrace_event_call *event_call = &event_##call; \
+ extern void perf_tpcounter_event(int, u64, u64, void *, int); \
+ struct ftrace_raw_##call *entry; \
+ u64 __addr = 0, __count = 1; \
+ unsigned long irq_flags; \
+ int __entry_size; \
+ int __data_size; \
+ int pc; \
+ \
+ local_save_flags(irq_flags); \
+ pc = preempt_count(); \
+ \
+ __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
+ __entry_size = __data_size + sizeof(*entry); \
+ \
+ do { \
+ char raw_data[__entry_size]; \
+ struct trace_entry *ent; \
+ \
+ entry = (struct ftrace_raw_##call *)raw_data; \
+ ent = &entry->ent; \
+ tracing_generic_entry_update(ent, irq_flags, pc); \
+ ent->type = event_call->id; \
+ \
+ tstruct \
+ \
+ { assign; } \
+ \
+ perf_tpcounter_event(event_call->id, __addr, __count, entry,\
+ __entry_size); \
+ } while (0); \
+ \
+}
+
+#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+#endif /* CONFIG_EVENT_PROFILE */
+
#undef _TRACE_PROFILE_INIT
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 06d210c..93f4312 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -2646,6 +2646,7 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
u64 counter;
} group_entry;
struct perf_callchain_entry *callchain = NULL;
+ struct perf_tracepoint_record *tp;
int callchain_size = 0;
u64 time;
struct {
@@ -2714,6 +2715,11 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
header.size += sizeof(u64);
}
+ if (sample_type & PERF_SAMPLE_TP_RECORD) {
+ tp = data->private;
+ header.size += tp->size;
+ }
+
ret = perf_output_begin(&handle, counter, header.size, nmi, 1);
if (ret)
return;
@@ -2777,6 +2783,9 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
}
}
+ if (sample_type & PERF_SAMPLE_TP_RECORD)
+ perf_output_copy(&handle, tp->record, tp->size);
+
perf_output_end(&handle);
}
@@ -3727,11 +3736,18 @@ static const struct pmu perf_ops_task_clock = {
};
#ifdef CONFIG_EVENT_PROFILE
-void perf_tpcounter_event(int event_id, u64 addr, u64 count)
+void perf_tpcounter_event(int event_id, u64 addr, u64 count, void *record,
+ int entry_size)
{
+ struct perf_tracepoint_record tp = {
+ .size = entry_size,
+ .record = record,
+ };
+
struct perf_sample_data data = {
.regs = get_irq_regs(),
.addr = addr,
+ .private = &tp,
};
if (!data.regs)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 87b004f..44308f3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -454,10 +454,6 @@ struct trace_entry *tracing_get_trace_entry(struct trace_array *tr,
struct trace_entry *trace_find_next_entry(struct trace_iterator *iter,
int *ent_cpu, u64 *ent_ts);
-void tracing_generic_entry_update(struct trace_entry *entry,
- unsigned long flags,
- int pc);
-
void default_wait_pipe(struct trace_iterator *iter);
void poll_wait_pipe(struct trace_iterator *iter);
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6da0992..90c9808 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -412,6 +412,7 @@ static void create_counter(int counter, int cpu, pid_t pid)
if (call_graph)
attr->sample_type |= PERF_SAMPLE_CALLCHAIN;
+
attr->mmap = track;
attr->comm = track;
attr->inherit = (cpu < 0) && inherit;
--
1.6.2.3
^ permalink raw reply related [flat|nested] 13+ messages in thread* [tip:perfcounters/tracing] perf_counter: Support for ftrace event records sampling
2009-08-06 23:25 [PATCH] perfcounters: Support for ftrace event records sampling Frederic Weisbecker
@ 2009-08-07 7:33 ` tip-bot for Frederic Weisbecker
2009-08-07 10:11 ` Ingo Molnar
2009-08-07 10:37 ` [PATCH] perfcounters: " Peter Zijlstra
2009-08-07 15:54 ` Peter Zijlstra
2 siblings, 1 reply; 13+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2009-08-07 7:33 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, acme, hpa, mingo, tzanussi, penberg,
a.p.zijlstra, lizf, efault, eduard.munteanu, fweisbec, rostedt,
tglx, laijs, mingo
Commit-ID: 24cb4fabe71f1d896b504cec82277ad7426a3d6d
Gitweb: http://git.kernel.org/tip/24cb4fabe71f1d896b504cec82277ad7426a3d6d
Author: Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Fri, 7 Aug 2009 01:25:54 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 7 Aug 2009 09:30:33 +0200
perf_counter: Support for ftrace event records sampling
This patch implements the kernel side support for ftrace event
record sampling.
A new counter sampling attribute is added:
PERF_SAMPLE_TP_RECORD
which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.
Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:
perf record -f -F 1 -a -e workqueue:workqueue_execution
perf report -D
0x21e18 [0x48]: event: 9
.
. ... raw event: size 72 bytes
. 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........
. 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!......
. 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve
. 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1...........
. 0040: e0 b1 31 81 ff ff ff ff .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
The raw ftrace binary record starts at offset 0020.
Translation:
struct trace_entry {
type = 0x2b = 43;
flags = 1;
preempt_count = 2;
pid = 0xa = 10;
tgid = 0xa = 10;
}
thread_comm = "events/1"
thread_pid = 0xa = 10;
func = 0xffffffff8131b1e0 = flush_to_ldisc()
What will come next?
- Userspace support ('perf trace'), 'flight data recorder' mode
for perf trace, etc.
- The unconditional copy from the profiling callback brings
some costs however if someone wants no such sampling to
occur, and needs to be fixed in the future. For that we need
to have an instant access to the perf counter attribute.
This is a matter of a flag to add in the struct ftrace_event.
- Take care of the events recursivity! Don't ever try to record
a lock event for example, it seems some locking is used in
the profiling fast path and lead to a tracing recursivity.
That will be fixed using raw spinlock or recursivity
protection.
- [...]
- Profit! :-)
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1249601154-5597-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/ftrace_event.h | 4 +-
include/linux/perf_counter.h | 9 +++-
include/trace/ftrace.h | 130 +++++++++++++++++++++++++++++++-----------
kernel/perf_counter.c | 18 ++++++-
kernel/trace/trace.h | 4 -
tools/perf/builtin-record.c | 1 +
6 files changed, 125 insertions(+), 41 deletions(-)
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5c093ff..8872ec9 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -89,7 +89,9 @@ enum print_line_t {
TRACE_TYPE_NO_CONSUME = 3 /* Handled but ask to not consume */
};
-
+void tracing_generic_entry_update(struct trace_entry *entry,
+ unsigned long flags,
+ int pc);
struct ring_buffer_event *
trace_current_buffer_lock_reserve(int type, unsigned long len,
unsigned long flags, int pc);
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index bd15d7a..e175982 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -121,8 +121,9 @@ enum perf_counter_sample_format {
PERF_SAMPLE_CPU = 1U << 7,
PERF_SAMPLE_PERIOD = 1U << 8,
PERF_SAMPLE_STREAM_ID = 1U << 9,
+ PERF_SAMPLE_TP_RECORD = 1U << 10,
- PERF_SAMPLE_MAX = 1U << 10, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 11, /* non-ABI */
};
/*
@@ -402,6 +403,11 @@ struct perf_callchain_entry {
__u64 ip[PERF_MAX_STACK_DEPTH];
};
+struct perf_tracepoint_record {
+ int size;
+ char *record;
+};
+
struct task_struct;
/**
@@ -670,6 +676,7 @@ struct perf_sample_data {
struct pt_regs *regs;
u64 addr;
u64 period;
+ void *private;
};
extern int perf_counter_overflow(struct perf_counter *counter, int nmi,
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index fec71f8..bbf9042 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -353,15 +353,7 @@ static inline int ftrace_get_offsets_##call( \
/*
* Generate the functions needed for tracepoint perf_counter support.
*
- * static void ftrace_profile_<call>(proto)
- * {
- * extern void perf_tpcounter_event(int, u64, u64);
- * u64 __addr = 0, __count = 1;
- *
- * <assign> <-- here we expand the TP_perf_assign() macro
- *
- * perf_tpcounter_event(event_<call>.id, __addr, __count);
- * }
+ * NOTE: The insertion profile callback (ftrace_profile_<call>) is defined later
*
* static int ftrace_profile_enable_<call>(struct ftrace_event_call *event_call)
* {
@@ -381,28 +373,10 @@ static inline int ftrace_get_offsets_##call( \
*
*/
-#undef TP_fast_assign
-#define TP_fast_assign(args...)
-
-#undef TP_perf_assign
-#define TP_perf_assign(args...) args
-
-#undef __perf_addr
-#define __perf_addr(a) __addr = (a)
-
-#undef __perf_count
-#define __perf_count(c) __count = (c)
-
#undef TRACE_EVENT
#define TRACE_EVENT(call, proto, args, tstruct, assign, print) \
\
-static void ftrace_profile_##call(proto) \
-{ \
- extern void perf_tpcounter_event(int, u64, u64); \
- u64 __addr = 0, __count = 1; \
- { assign; } \
- perf_tpcounter_event(event_##call.id, __addr, __count); \
-} \
+static void ftrace_profile_##call(proto); \
\
static int ftrace_profile_enable_##call(struct ftrace_event_call *event_call) \
{ \
@@ -422,12 +396,6 @@ static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
-#undef TP_fast_assign
-#define TP_fast_assign(args...) args
-
-#undef TP_perf_assign
-#define TP_perf_assign(args...)
-
#endif
/*
@@ -647,5 +615,99 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+/*
+ * Define the insertion callback to profile events
+ *
+ * The job is very similar to ftrace_raw_event_<call> except that we don't
+ * insert in the ring buffer but in a perf counter.
+ *
+ * static void ftrace_profile_<call>(proto)
+ * {
+ * struct ftrace_data_offsets_<call> __maybe_unused __data_offsets;
+ * struct ftrace_event_call *event_call = &event_<call>;
+ * extern void perf_tpcounter_event(int, u64, u64, void *, int);
+ * struct ftrace_raw_##call *entry;
+ * u64 __addr = 0, __count = 1;
+ * unsigned long irq_flags;
+ * int __entry_size;
+ * int __data_size;
+ * int pc;
+ *
+ * local_save_flags(irq_flags);
+ * pc = preempt_count();
+ *
+ * __data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
+ * __entry_size = __data_size + sizeof(*entry);
+ *
+ * do {
+ * char raw_data[__entry_size]; <- allocate our sample in the stack
+ * struct trace_entry *ent;
+ *
+ * entry = (struct ftrace_raw_<call> *)raw_data;
+ * ent = &entry->ent;
+ * tracing_generic_entry_update(ent, irq_flags, pc);
+ * ent->type = event_call->id;
+ *
+ * <tstruct> <- do some jobs with dynamic arrays
+ *
+ * <assign> <- affect our values
+ *
+ * perf_tpcounter_event(event_call->id, __addr, __count, entry,
+ * __entry_size); <- submit them to perf counter
+ * } while (0);
+ *
+ * }
+ */
+
+#ifdef CONFIG_EVENT_PROFILE
+
+#undef __perf_addr
+#define __perf_addr(a) __addr = (a)
+
+#undef __perf_count
+#define __perf_count(c) __count = (c)
+
+#undef TRACE_EVENT
+#define TRACE_EVENT(call, proto, args, tstruct, assign, print) \
+static void ftrace_profile_##call(proto) \
+{ \
+ struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
+ struct ftrace_event_call *event_call = &event_##call; \
+ extern void perf_tpcounter_event(int, u64, u64, void *, int); \
+ struct ftrace_raw_##call *entry; \
+ u64 __addr = 0, __count = 1; \
+ unsigned long irq_flags; \
+ int __entry_size; \
+ int __data_size; \
+ int pc; \
+ \
+ local_save_flags(irq_flags); \
+ pc = preempt_count(); \
+ \
+ __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
+ __entry_size = __data_size + sizeof(*entry); \
+ \
+ do { \
+ char raw_data[__entry_size]; \
+ struct trace_entry *ent; \
+ \
+ entry = (struct ftrace_raw_##call *)raw_data; \
+ ent = &entry->ent; \
+ tracing_generic_entry_update(ent, irq_flags, pc); \
+ ent->type = event_call->id; \
+ \
+ tstruct \
+ \
+ { assign; } \
+ \
+ perf_tpcounter_event(event_call->id, __addr, __count, entry,\
+ __entry_size); \
+ } while (0); \
+ \
+}
+
+#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+#endif /* CONFIG_EVENT_PROFILE */
+
#undef _TRACE_PROFILE_INIT
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 7d281b4..40155bc 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -2631,6 +2631,7 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
u64 counter;
} group_entry;
struct perf_callchain_entry *callchain = NULL;
+ struct perf_tracepoint_record *tp;
int callchain_size = 0;
u64 time;
struct {
@@ -2699,6 +2700,11 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
header.size += sizeof(u64);
}
+ if (sample_type & PERF_SAMPLE_TP_RECORD) {
+ tp = data->private;
+ header.size += tp->size;
+ }
+
ret = perf_output_begin(&handle, counter, header.size, nmi, 1);
if (ret)
return;
@@ -2762,6 +2768,9 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
}
}
+ if (sample_type & PERF_SAMPLE_TP_RECORD)
+ perf_output_copy(&handle, tp->record, tp->size);
+
perf_output_end(&handle);
}
@@ -3673,11 +3682,18 @@ static const struct pmu perf_ops_task_clock = {
};
#ifdef CONFIG_EVENT_PROFILE
-void perf_tpcounter_event(int event_id, u64 addr, u64 count)
+void perf_tpcounter_event(int event_id, u64 addr, u64 count, void *record,
+ int entry_size)
{
+ struct perf_tracepoint_record tp = {
+ .size = entry_size,
+ .record = record,
+ };
+
struct perf_sample_data data = {
.regs = get_irq_regs(),
.addr = addr,
+ .private = &tp,
};
if (!data.regs)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3548ae5..8b9f4f6 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -438,10 +438,6 @@ struct trace_entry *tracing_get_trace_entry(struct trace_array *tr,
struct trace_entry *trace_find_next_entry(struct trace_iterator *iter,
int *ent_cpu, u64 *ent_ts);
-void tracing_generic_entry_update(struct trace_entry *entry,
- unsigned long flags,
- int pc);
-
void default_wait_pipe(struct trace_iterator *iter);
void poll_wait_pipe(struct trace_iterator *iter);
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6da0992..90c9808 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -412,6 +412,7 @@ static void create_counter(int counter, int cpu, pid_t pid)
if (call_graph)
attr->sample_type |= PERF_SAMPLE_CALLCHAIN;
+
attr->mmap = track;
attr->comm = track;
attr->inherit = (cpu < 0) && inherit;
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [tip:perfcounters/tracing] perf_counter: Support for ftrace event records sampling
2009-08-07 7:33 ` [tip:perfcounters/tracing] perf_counter: " tip-bot for Frederic Weisbecker
@ 2009-08-07 10:11 ` Ingo Molnar
2009-08-07 20:22 ` Frederic Weisbecker
0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2009-08-07 10:11 UTC (permalink / raw)
To: mingo, hpa, acme, paulus, linux-kernel, tzanussi, lizf,
a.p.zijlstra, penberg, efault, eduard.munteanu, fweisbec, rostedt,
tglx, laijs
Cc: linux-tip-commits
* tip-bot for Frederic Weisbecker <fweisbec@gmail.com> wrote:
> Commit-ID: 24cb4fabe71f1d896b504cec82277ad7426a3d6d
> Gitweb: http://git.kernel.org/tip/24cb4fabe71f1d896b504cec82277ad7426a3d6d
> Author: Frederic Weisbecker <fweisbec@gmail.com>
> AuthorDate: Fri, 7 Aug 2009 01:25:54 +0200
> Committer: Ingo Molnar <mingo@elte.hu>
> CommitDate: Fri, 7 Aug 2009 09:30:33 +0200
>
> perf_counter: Support for ftrace event records sampling
this commit triggers a new build warning:
kernel/perf_counter.c: In function ‘perf_counter_output’:
kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tip:perfcounters/tracing] perf_counter: Support for ftrace event records sampling
2009-08-07 10:11 ` Ingo Molnar
@ 2009-08-07 20:22 ` Frederic Weisbecker
0 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2009-08-07 20:22 UTC (permalink / raw)
To: Ingo Molnar
Cc: mingo, hpa, acme, paulus, linux-kernel, tzanussi, lizf,
a.p.zijlstra, penberg, efault, eduard.munteanu, rostedt, tglx,
laijs, linux-tip-commits
On Fri, Aug 07, 2009 at 12:11:36PM +0200, Ingo Molnar wrote:
>
> * tip-bot for Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > Commit-ID: 24cb4fabe71f1d896b504cec82277ad7426a3d6d
> > Gitweb: http://git.kernel.org/tip/24cb4fabe71f1d896b504cec82277ad7426a3d6d
> > Author: Frederic Weisbecker <fweisbec@gmail.com>
> > AuthorDate: Fri, 7 Aug 2009 01:25:54 +0200
> > Committer: Ingo Molnar <mingo@elte.hu>
> > CommitDate: Fri, 7 Aug 2009 09:30:33 +0200
> >
> > perf_counter: Support for ftrace event records sampling
>
> this commit triggers a new build warning:
>
> kernel/perf_counter.c: In function ‘perf_counter_output’:
> kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function
>
> Ingo
Ok, I'll fix it, thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] perfcounters: Support for ftrace event records sampling
2009-08-06 23:25 [PATCH] perfcounters: Support for ftrace event records sampling Frederic Weisbecker
2009-08-07 7:33 ` [tip:perfcounters/tracing] perf_counter: " tip-bot for Frederic Weisbecker
@ 2009-08-07 10:37 ` Peter Zijlstra
2009-08-07 10:58 ` Ingo Molnar
` (2 more replies)
2009-08-07 15:54 ` Peter Zijlstra
2 siblings, 3 replies; 13+ messages in thread
From: Peter Zijlstra @ 2009-08-07 10:37 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Mike Galbraith,
Steven Rostedt, Paul Mackerras, Pekka Enberg, Gabriel Munteanu,
Li Zefan, Lai Jiangshan
On Fri, 2009-08-07 at 01:25 +0200, Frederic Weisbecker wrote:
> This patch brings the kernel side support for ftrace event record
> sampling.
>
> A new counter attribute is added: PERF_SAMPLE_TP_RECORD which requests
> ftrace events record sampling.
>
> + PERF_SAMPLE_TP_RECORD = 1U << 10,
I'd really want this thing called PERF_SAMPLE_RAW
> - PERF_SAMPLE_MAX = 1U << 10, /* non-ABI */
> + PERF_SAMPLE_MAX = 1U << 11, /* non-ABI */
> };
>
> /*
> @@ -413,6 +414,11 @@ struct perf_callchain_entry {
> __u64 ip[PERF_MAX_STACK_DEPTH];
> };
>
> +struct perf_tracepoint_record {
> + int size;
> + char *record;
> +};
Which would make this:
struct perf_raw_record {
u32 size;
void *data;
};
> struct task_struct;
>
> /**
> @@ -681,6 +687,7 @@ struct perf_sample_data {
> struct pt_regs *regs;
> u64 addr;
> u64 period;
> + void *private;
> };
might as well make that struct perf_raw_record *raw;
> @@ -649,5 +617,99 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
>
> #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
>
> +/*
> + * Define the insertion callback to profile events
> + *
> + * The job is very similar to ftrace_raw_event_<call> except that we don't
> + * insert in the ring buffer but in a perf counter.
> + *
> + * static void ftrace_profile_<call>(proto)
> + * {
> + * struct ftrace_data_offsets_<call> __maybe_unused __data_offsets;
> + * struct ftrace_event_call *event_call = &event_<call>;
> + * extern void perf_tpcounter_event(int, u64, u64, void *, int);
> + * struct ftrace_raw_##call *entry;
> + * u64 __addr = 0, __count = 1;
> + * unsigned long irq_flags;
> + * int __entry_size;
> + * int __data_size;
> + * int pc;
> + *
> + * local_save_flags(irq_flags);
> + * pc = preempt_count();
> + *
> + * __data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
> + * __entry_size = __data_size + sizeof(*entry);
> + *
> + * do {
> + * char raw_data[__entry_size]; <- allocate our sample in the stack
> + * struct trace_entry *ent;
> + *
> + * entry = (struct ftrace_raw_<call> *)raw_data;
> + * ent = &entry->ent;
> + * tracing_generic_entry_update(ent, irq_flags, pc);
> + * ent->type = event_call->id;
> + *
> + * <tstruct> <- do some jobs with dynamic arrays
> + *
> + * <assign> <- affect our values
> + *
> + * perf_tpcounter_event(event_call->id, __addr, __count, entry,
> + * __entry_size); <- submit them to perf counter
> + * } while (0);
> + *
> + * }
> + */
> +
> +#ifdef CONFIG_EVENT_PROFILE
> +
> +#undef __perf_addr
> +#define __perf_addr(a) __addr = (a)
> +
> +#undef __perf_count
> +#define __perf_count(c) __count = (c)
> +
> +#undef TRACE_EVENT
> +#define TRACE_EVENT(call, proto, args, tstruct, assign, print) \
> +static void ftrace_profile_##call(proto) \
> +{ \
> + struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
> + struct ftrace_event_call *event_call = &event_##call; \
> + extern void perf_tpcounter_event(int, u64, u64, void *, int); \
> + struct ftrace_raw_##call *entry; \
> + u64 __addr = 0, __count = 1; \
> + unsigned long irq_flags; \
> + int __entry_size; \
> + int __data_size; \
> + int pc; \
> + \
> + local_save_flags(irq_flags); \
> + pc = preempt_count(); \
> + \
> + __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> + __entry_size = __data_size + sizeof(*entry); \
> + \
> + do { \
> + char raw_data[__entry_size]; \
> + struct trace_entry *ent; \
> + \
> + entry = (struct ftrace_raw_##call *)raw_data; \
> + ent = &entry->ent; \
> + tracing_generic_entry_update(ent, irq_flags, pc); \
> + ent->type = event_call->id; \
> + \
> + tstruct \
> + \
> + { assign; } \
> + \
> + perf_tpcounter_event(event_call->id, __addr, __count, entry,\
> + __entry_size); \
> + } while (0); \
> + \
> +}
ok, so the one concern I have here is that the data needs to fit on the
stack. What if someone puts a large string in the data?
> +
> +#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> +#endif /* CONFIG_EVENT_PROFILE */
> +
> #undef _TRACE_PROFILE_INIT
>
> diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
> index 06d210c..93f4312 100644
> --- a/kernel/perf_counter.c
> +++ b/kernel/perf_counter.c
> @@ -2646,6 +2646,7 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
> u64 counter;
> } group_entry;
> struct perf_callchain_entry *callchain = NULL;
> + struct perf_tracepoint_record *tp;
> int callchain_size = 0;
> u64 time;
> struct {
> @@ -2714,6 +2715,11 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
> header.size += sizeof(u64);
> }
>
> + if (sample_type & PERF_SAMPLE_TP_RECORD) {
> + tp = data->private;
> + header.size += tp->size;
> + }
> +
> ret = perf_output_begin(&handle, counter, header.size, nmi, 1);
> if (ret)
> return;
> @@ -2777,6 +2783,9 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
> }
> }
>
> + if (sample_type & PERF_SAMPLE_TP_RECORD)
> + perf_output_copy(&handle, tp->record, tp->size);
> +
> perf_output_end(&handle);
> }
You seem to fail to round up to a multiple of u64 somewhere along the
line, that'll mess things up as events are supposed to be u64 aligned.
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 6da0992..90c9808 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -412,6 +412,7 @@ static void create_counter(int counter, int cpu, pid_t pid)
> if (call_graph)
> attr->sample_type |= PERF_SAMPLE_CALLCHAIN;
>
> +
> attr->mmap = track;
> attr->comm = track;
> attr->inherit = (cpu < 0) && inherit;
Do we really need that extra whitespace?
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] perfcounters: Support for ftrace event records sampling
2009-08-07 10:37 ` [PATCH] perfcounters: " Peter Zijlstra
@ 2009-08-07 10:58 ` Ingo Molnar
2009-08-07 20:09 ` Peter Zijlstra
2009-08-07 20:26 ` Frederic Weisbecker
2 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2009-08-07 10:58 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Frederic Weisbecker, LKML, Arnaldo Carvalho de Melo,
Mike Galbraith, Steven Rostedt, Paul Mackerras, Pekka Enberg,
Gabriel Munteanu, Li Zefan, Lai Jiangshan
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Fri, 2009-08-07 at 01:25 +0200, Frederic Weisbecker wrote:
> > This patch brings the kernel side support for ftrace event record
> > sampling.
> >
> > A new counter attribute is added: PERF_SAMPLE_TP_RECORD which requests
> > ftrace events record sampling.
> >
>
> > + PERF_SAMPLE_TP_RECORD = 1U << 10,
>
> I'd really want this thing called PERF_SAMPLE_RAW
yeah, i'd agree. I suggested to Frederic to use TP_RECORD but in
hindsight we are better off to handle PERF_SAMPLE_RAW as a 'non-ABI'
embelishment of samples, of not firmly defined content.
In the tracepoint case, that extra raw content is defined by:
/debug/tracing/events/*/format
and subject to changes.
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] perfcounters: Support for ftrace event records sampling
2009-08-07 10:37 ` [PATCH] perfcounters: " Peter Zijlstra
2009-08-07 10:58 ` Ingo Molnar
@ 2009-08-07 20:09 ` Peter Zijlstra
2009-08-07 20:21 ` Frederic Weisbecker
2009-08-07 20:26 ` Frederic Weisbecker
2 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2009-08-07 20:09 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Mike Galbraith,
Steven Rostedt, Paul Mackerras, Pekka Enberg, Gabriel Munteanu,
Li Zefan, Lai Jiangshan
On Fri, 2009-08-07 at 12:38 +0200, Peter Zijlstra wrote:
> > +static void ftrace_profile_##call(proto) \
> > +{ \
> > + struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
> > + struct ftrace_event_call *event_call = &event_##call; \
> > + extern void perf_tpcounter_event(int, u64, u64, void *, int); \
> > + struct ftrace_raw_##call *entry; \
> > + u64 __addr = 0, __count = 1; \
> > + unsigned long irq_flags; \
> > + int __entry_size; \
> > + int __data_size; \
> > + int pc; \
> > + \
> > + local_save_flags(irq_flags); \
> > + pc = preempt_count(); \
> > + \
> > + __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> > + __entry_size = __data_size + sizeof(*entry); \
> > + \
> > + do { \
> > + char raw_data[__entry_size]; \
> > + struct trace_entry *ent; \
> > + \
> > + entry = (struct ftrace_raw_##call *)raw_data; \
> > + ent = &entry->ent; \
> > + tracing_generic_entry_update(ent, irq_flags, pc); \
> > + ent->type = event_call->id; \
> > + \
> > + tstruct \
> > + \
> > + { assign; } \
> > + \
> > + perf_tpcounter_event(event_call->id, __addr, __count, entry,\
> > + __entry_size); \
> > + } while (0); \
> > + \
> > +}
>
> ok, so the one concern I have here is that the data needs to fit on the
> stack. What if someone puts a large string in the data?
Also, how NMI safe is all that?
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] perfcounters: Support for ftrace event records sampling
2009-08-07 20:09 ` Peter Zijlstra
@ 2009-08-07 20:21 ` Frederic Weisbecker
2009-08-07 20:28 ` Peter Zijlstra
0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2009-08-07 20:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Mike Galbraith,
Steven Rostedt, Paul Mackerras, Pekka Enberg, Gabriel Munteanu,
Li Zefan, Lai Jiangshan
On Fri, Aug 07, 2009 at 10:09:07PM +0200, Peter Zijlstra wrote:
> On Fri, 2009-08-07 at 12:38 +0200, Peter Zijlstra wrote:
> > > +static void ftrace_profile_##call(proto) \
> > > +{ \
> > > + struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
> > > + struct ftrace_event_call *event_call = &event_##call; \
> > > + extern void perf_tpcounter_event(int, u64, u64, void *, int); \
> > > + struct ftrace_raw_##call *entry; \
> > > + u64 __addr = 0, __count = 1; \
> > > + unsigned long irq_flags; \
> > > + int __entry_size; \
> > > + int __data_size; \
> > > + int pc; \
> > > + \
> > > + local_save_flags(irq_flags); \
> > > + pc = preempt_count(); \
> > > + \
> > > + __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> > > + __entry_size = __data_size + sizeof(*entry); \
> > > + \
> > > + do { \
> > > + char raw_data[__entry_size]; \
> > > + struct trace_entry *ent; \
> > > + \
> > > + entry = (struct ftrace_raw_##call *)raw_data; \
> > > + ent = &entry->ent; \
> > > + tracing_generic_entry_update(ent, irq_flags, pc); \
> > > + ent->type = event_call->id; \
> > > + \
> > > + tstruct \
> > > + \
> > > + { assign; } \
> > > + \
> > > + perf_tpcounter_event(event_call->id, __addr, __count, entry,\
> > > + __entry_size); \
> > > + } while (0); \
> > > + \
> > > +}
> >
> > ok, so the one concern I have here is that the data needs to fit on the
> > stack. What if someone puts a large string in the data?
>
> Also, how NMI safe is all that?
Everything is allocated in the stack. It's NMI safe AFAICS, am I missing
something?
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] perfcounters: Support for ftrace event records sampling
2009-08-07 20:21 ` Frederic Weisbecker
@ 2009-08-07 20:28 ` Peter Zijlstra
2009-08-07 20:36 ` Frederic Weisbecker
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2009-08-07 20:28 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Mike Galbraith,
Steven Rostedt, Paul Mackerras, Pekka Enberg, Gabriel Munteanu,
Li Zefan, Lai Jiangshan
On Fri, 2009-08-07 at 22:21 +0200, Frederic Weisbecker wrote:
> On Fri, Aug 07, 2009 at 10:09:07PM +0200, Peter Zijlstra wrote:
> > On Fri, 2009-08-07 at 12:38 +0200, Peter Zijlstra wrote:
> > > > +static void ftrace_profile_##call(proto) \
> > > > +{ \
> > > > + struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
> > > > + struct ftrace_event_call *event_call = &event_##call; \
> > > > + extern void perf_tpcounter_event(int, u64, u64, void *, int); \
> > > > + struct ftrace_raw_##call *entry; \
> > > > + u64 __addr = 0, __count = 1; \
> > > > + unsigned long irq_flags; \
> > > > + int __entry_size; \
> > > > + int __data_size; \
> > > > + int pc; \
> > > > + \
> > > > + local_save_flags(irq_flags); \
> > > > + pc = preempt_count(); \
> > > > + \
> > > > + __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> > > > + __entry_size = __data_size + sizeof(*entry); \
> > > > + \
> > > > + do { \
> > > > + char raw_data[__entry_size]; \
> > > > + struct trace_entry *ent; \
> > > > + \
> > > > + entry = (struct ftrace_raw_##call *)raw_data; \
> > > > + ent = &entry->ent; \
> > > > + tracing_generic_entry_update(ent, irq_flags, pc); \
> > > > + ent->type = event_call->id; \
> > > > + \
> > > > + tstruct \
> > > > + \
> > > > + { assign; } \
> > > > + \
> > > > + perf_tpcounter_event(event_call->id, __addr, __count, entry,\
> > > > + __entry_size); \
> > > > + } while (0); \
> > > > + \
> > > > +}
> > >
> > > ok, so the one concern I have here is that the data needs to fit on the
> > > stack. What if someone puts a large string in the data?
> >
> > Also, how NMI safe is all that?
>
>
> Everything is allocated in the stack. It's NMI safe AFAICS, am I missing
> something?
I've no idea what all those trace calls do, I was hoping you would and
say, sure, no problem ;-)
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] perfcounters: Support for ftrace event records sampling
2009-08-07 20:28 ` Peter Zijlstra
@ 2009-08-07 20:36 ` Frederic Weisbecker
0 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2009-08-07 20:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Mike Galbraith,
Steven Rostedt, Paul Mackerras, Pekka Enberg, Gabriel Munteanu,
Li Zefan, Lai Jiangshan
On Fri, Aug 07, 2009 at 10:28:51PM +0200, Peter Zijlstra wrote:
> On Fri, 2009-08-07 at 22:21 +0200, Frederic Weisbecker wrote:
> > On Fri, Aug 07, 2009 at 10:09:07PM +0200, Peter Zijlstra wrote:
> > > On Fri, 2009-08-07 at 12:38 +0200, Peter Zijlstra wrote:
> > > > > +static void ftrace_profile_##call(proto) \
> > > > > +{ \
> > > > > + struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
> > > > > + struct ftrace_event_call *event_call = &event_##call; \
> > > > > + extern void perf_tpcounter_event(int, u64, u64, void *, int); \
> > > > > + struct ftrace_raw_##call *entry; \
> > > > > + u64 __addr = 0, __count = 1; \
> > > > > + unsigned long irq_flags; \
> > > > > + int __entry_size; \
> > > > > + int __data_size; \
> > > > > + int pc; \
> > > > > + \
> > > > > + local_save_flags(irq_flags); \
> > > > > + pc = preempt_count(); \
> > > > > + \
> > > > > + __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> > > > > + __entry_size = __data_size + sizeof(*entry); \
> > > > > + \
> > > > > + do { \
> > > > > + char raw_data[__entry_size]; \
> > > > > + struct trace_entry *ent; \
> > > > > + \
> > > > > + entry = (struct ftrace_raw_##call *)raw_data; \
> > > > > + ent = &entry->ent; \
> > > > > + tracing_generic_entry_update(ent, irq_flags, pc); \
> > > > > + ent->type = event_call->id; \
> > > > > + \
> > > > > + tstruct \
> > > > > + \
> > > > > + { assign; } \
> > > > > + \
> > > > > + perf_tpcounter_event(event_call->id, __addr, __count, entry,\
> > > > > + __entry_size); \
> > > > > + } while (0); \
> > > > > + \
> > > > > +}
> > > >
> > > > ok, so the one concern I have here is that the data needs to fit on the
> > > > stack. What if someone puts a large string in the data?
> > >
> > > Also, how NMI safe is all that?
> >
> >
> > Everything is allocated in the stack. It's NMI safe AFAICS, am I missing
> > something?
>
> I've no idea what all those trace calls do, I was hoping you would and
> say, sure, no problem ;-)
Ah ok :)
To sum up, everything there is allocated in the stack, it's an issue
as you reported but beside that it makes it all NMI safe.
ftrace_get_offset_#call() get the __dynamic_array/__string things
and computes the total _local_ string length.
It also fills __data_offsets with the dynamic arrays offset, __data_offset
is also in the stack.
tracing_entry_generic_update() fills *ent (stack allocated) with the standard
informations (irq_flags, preempt_count, etc...).
And that's it :)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] perfcounters: Support for ftrace event records sampling
2009-08-07 10:37 ` [PATCH] perfcounters: " Peter Zijlstra
2009-08-07 10:58 ` Ingo Molnar
2009-08-07 20:09 ` Peter Zijlstra
@ 2009-08-07 20:26 ` Frederic Weisbecker
2 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2009-08-07 20:26 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Mike Galbraith,
Steven Rostedt, Paul Mackerras, Pekka Enberg, Gabriel Munteanu,
Li Zefan, Lai Jiangshan
On Fri, Aug 07, 2009 at 12:37:57PM +0200, Peter Zijlstra wrote:
> On Fri, 2009-08-07 at 01:25 +0200, Frederic Weisbecker wrote:
> > This patch brings the kernel side support for ftrace event record
> > sampling.
> >
> > A new counter attribute is added: PERF_SAMPLE_TP_RECORD which requests
> > ftrace events record sampling.
> >
>
> > + PERF_SAMPLE_TP_RECORD = 1U << 10,
>
> I'd really want this thing called PERF_SAMPLE_RAW
>
> > - PERF_SAMPLE_MAX = 1U << 10, /* non-ABI */
> > + PERF_SAMPLE_MAX = 1U << 11, /* non-ABI */
> > };
> >
> > /*
> > @@ -413,6 +414,11 @@ struct perf_callchain_entry {
> > __u64 ip[PERF_MAX_STACK_DEPTH];
> > };
> >
> > +struct perf_tracepoint_record {
> > + int size;
> > + char *record;
> > +};
>
> Which would make this:
>
> struct perf_raw_record {
> u32 size;
> void *data;
> };
>
> > struct task_struct;
> >
> > /**
> > @@ -681,6 +687,7 @@ struct perf_sample_data {
> > struct pt_regs *regs;
> > u64 addr;
> > u64 period;
> > + void *private;
> > };
>
> might as well make that struct perf_raw_record *raw;
Ok. Yeah that's far more generic (and typed).
>
> > @@ -649,5 +617,99 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
> >
> > #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> >
> > +/*
> > + * Define the insertion callback to profile events
> > + *
> > + * The job is very similar to ftrace_raw_event_<call> except that we don't
> > + * insert in the ring buffer but in a perf counter.
> > + *
> > + * static void ftrace_profile_<call>(proto)
> > + * {
> > + * struct ftrace_data_offsets_<call> __maybe_unused __data_offsets;
> > + * struct ftrace_event_call *event_call = &event_<call>;
> > + * extern void perf_tpcounter_event(int, u64, u64, void *, int);
> > + * struct ftrace_raw_##call *entry;
> > + * u64 __addr = 0, __count = 1;
> > + * unsigned long irq_flags;
> > + * int __entry_size;
> > + * int __data_size;
> > + * int pc;
> > + *
> > + * local_save_flags(irq_flags);
> > + * pc = preempt_count();
> > + *
> > + * __data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
> > + * __entry_size = __data_size + sizeof(*entry);
> > + *
> > + * do {
> > + * char raw_data[__entry_size]; <- allocate our sample in the stack
> > + * struct trace_entry *ent;
> > + *
> > + * entry = (struct ftrace_raw_<call> *)raw_data;
> > + * ent = &entry->ent;
> > + * tracing_generic_entry_update(ent, irq_flags, pc);
> > + * ent->type = event_call->id;
> > + *
> > + * <tstruct> <- do some jobs with dynamic arrays
> > + *
> > + * <assign> <- affect our values
> > + *
> > + * perf_tpcounter_event(event_call->id, __addr, __count, entry,
> > + * __entry_size); <- submit them to perf counter
> > + * } while (0);
> > + *
> > + * }
> > + */
> > +
> > +#ifdef CONFIG_EVENT_PROFILE
> > +
> > +#undef __perf_addr
> > +#define __perf_addr(a) __addr = (a)
> > +
> > +#undef __perf_count
> > +#define __perf_count(c) __count = (c)
> > +
> > +#undef TRACE_EVENT
> > +#define TRACE_EVENT(call, proto, args, tstruct, assign, print) \
> > +static void ftrace_profile_##call(proto) \
> > +{ \
> > + struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
> > + struct ftrace_event_call *event_call = &event_##call; \
> > + extern void perf_tpcounter_event(int, u64, u64, void *, int); \
> > + struct ftrace_raw_##call *entry; \
> > + u64 __addr = 0, __count = 1; \
> > + unsigned long irq_flags; \
> > + int __entry_size; \
> > + int __data_size; \
> > + int pc; \
> > + \
> > + local_save_flags(irq_flags); \
> > + pc = preempt_count(); \
> > + \
> > + __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> > + __entry_size = __data_size + sizeof(*entry); \
> > + \
> > + do { \
> > + char raw_data[__entry_size]; \
> > + struct trace_entry *ent; \
> > + \
> > + entry = (struct ftrace_raw_##call *)raw_data; \
> > + ent = &entry->ent; \
> > + tracing_generic_entry_update(ent, irq_flags, pc); \
> > + ent->type = event_call->id; \
> > + \
> > + tstruct \
> > + \
> > + { assign; } \
> > + \
> > + perf_tpcounter_event(event_call->id, __addr, __count, entry,\
> > + __entry_size); \
> > + } while (0); \
> > + \
> > +}
>
> ok, so the one concern I have here is that the data needs to fit on the
> stack. What if someone puts a large string in the data?
Yeah, I'll try to fix this by storing strings as pointers and copy
them in perf counter output only.
TODO-listed, with all the above.
> > +
> > +#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> > +#endif /* CONFIG_EVENT_PROFILE */
> > +
> > #undef _TRACE_PROFILE_INIT
> >
> > diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
> > index 06d210c..93f4312 100644
> > --- a/kernel/perf_counter.c
> > +++ b/kernel/perf_counter.c
> > @@ -2646,6 +2646,7 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
> > u64 counter;
> > } group_entry;
> > struct perf_callchain_entry *callchain = NULL;
> > + struct perf_tracepoint_record *tp;
> > int callchain_size = 0;
> > u64 time;
> > struct {
> > @@ -2714,6 +2715,11 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
> > header.size += sizeof(u64);
> > }
> >
> > + if (sample_type & PERF_SAMPLE_TP_RECORD) {
> > + tp = data->private;
> > + header.size += tp->size;
> > + }
> > +
> > ret = perf_output_begin(&handle, counter, header.size, nmi, 1);
> > if (ret)
> > return;
> > @@ -2777,6 +2783,9 @@ static void perf_counter_output(struct perf_counter *counter, int nmi,
> > }
> > }
> >
> > + if (sample_type & PERF_SAMPLE_TP_RECORD)
> > + perf_output_copy(&handle, tp->record, tp->size);
> > +
> > perf_output_end(&handle);
> > }
>
> You seem to fail to round up to a multiple of u64 somewhere along the
> line, that'll mess things up as events are supposed to be u64 aligned.
Indeed.
>
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 6da0992..90c9808 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -412,6 +412,7 @@ static void create_counter(int counter, int cpu, pid_t pid)
> > if (call_graph)
> > attr->sample_type |= PERF_SAMPLE_CALLCHAIN;
> >
> > +
> > attr->mmap = track;
> > attr->comm = track;
> > attr->inherit = (cpu < 0) && inherit;
>
> Do we really need that extra whitespace?
Sorry, I added the explicit PERF_SAMPLE_TP_RECORD here to test this patch
and forgot the newline when I removed it :-s
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] perfcounters: Support for ftrace event records sampling
2009-08-06 23:25 [PATCH] perfcounters: Support for ftrace event records sampling Frederic Weisbecker
2009-08-07 7:33 ` [tip:perfcounters/tracing] perf_counter: " tip-bot for Frederic Weisbecker
2009-08-07 10:37 ` [PATCH] perfcounters: " Peter Zijlstra
@ 2009-08-07 15:54 ` Peter Zijlstra
2009-08-07 16:18 ` [tip:perfcounters/core] perfcounters: Support for ftrace event records sampling, fix modules tip-bot for Peter Zijlstra
2 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2009-08-07 15:54 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Mike Galbraith,
Steven Rostedt, Paul Mackerras, Pekka Enberg, Gabriel Munteanu,
Li Zefan, Lai Jiangshan
On Fri, 2009-08-07 at 01:25 +0200, Frederic Weisbecker wrote:
> +static void ftrace_profile_##call(proto) \
> +{ \
> + struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
> + struct ftrace_event_call *event_call = &event_##call; \
> + extern void perf_tpcounter_event(int, u64, u64, void *, int); \
> + struct ftrace_raw_##call *entry; \
> + u64 __addr = 0, __count = 1; \
> + unsigned long irq_flags; \
> + int __entry_size; \
> + int __data_size; \
> + int pc; \
> + \
> + local_save_flags(irq_flags); \
> + pc = preempt_count(); \
> + \
> + __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> + __entry_size = __data_size + sizeof(*entry); \
> + \
> + do { \
> + char raw_data[__entry_size]; \
> + struct trace_entry *ent; \
> + \
> + entry = (struct ftrace_raw_##call *)raw_data; \
> + ent = &entry->ent; \
> + tracing_generic_entry_update(ent, irq_flags, pc); \
> + ent->type = event_call->id; \
> + \
> + tstruct \
> + \
> + { assign; } \
> + \
> + perf_tpcounter_event(event_call->id, __addr, __count, entry,\
> + __entry_size); \
> + } while (0); \
Causes:
ERROR: "tracing_generic_entry_update" [fs/jbd2/jbd2.ko] undefined!
ERROR: "tracing_generic_entry_update" [fs/ext4/ext4.ko] undefined!
Which will be fixed by:
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/trace/trace.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6344573..e793cda 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -848,6 +848,7 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags,
((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
(need_resched() ? TRACE_FLAG_NEED_RESCHED : 0);
}
+EXPORT_SYMBOL_GPL(tracing_generic_entry_update);
struct ring_buffer_event *trace_buffer_lock_reserve(struct trace_array *tr,
int type,
^ permalink raw reply related [flat|nested] 13+ messages in thread* [tip:perfcounters/core] perfcounters: Support for ftrace event records sampling, fix modules
2009-08-07 15:54 ` Peter Zijlstra
@ 2009-08-07 16:18 ` tip-bot for Peter Zijlstra
0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Peter Zijlstra @ 2009-08-07 16:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, acme, hpa, mingo, lizf, penberg,
a.p.zijlstra, efault, eduard.munteanu, fweisbec, rostedt, tglx,
laijs, mingo
Commit-ID: 11c1870ca7e8dfede9f8aab654814a7a67378857
Gitweb: http://git.kernel.org/tip/11c1870ca7e8dfede9f8aab654814a7a67378857
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Fri, 7 Aug 2009 17:54:38 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 7 Aug 2009 18:14:58 +0200
perfcounters: Support for ftrace event records sampling, fix modules
Fix:
ERROR: "tracing_generic_entry_update" [fs/jbd2/jbd2.ko] undefined!
ERROR: "tracing_generic_entry_update" [fs/ext4/ext4.ko] undefined!
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1249660478.32113.736.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/trace/trace.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8930e39..c22b40f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -848,6 +848,7 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags,
((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
(need_resched() ? TRACE_FLAG_NEED_RESCHED : 0);
}
+EXPORT_SYMBOL_GPL(tracing_generic_entry_update);
struct ring_buffer_event *trace_buffer_lock_reserve(struct trace_array *tr,
int type,
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-08-07 20:36 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-06 23:25 [PATCH] perfcounters: Support for ftrace event records sampling Frederic Weisbecker
2009-08-07 7:33 ` [tip:perfcounters/tracing] perf_counter: " tip-bot for Frederic Weisbecker
2009-08-07 10:11 ` Ingo Molnar
2009-08-07 20:22 ` Frederic Weisbecker
2009-08-07 10:37 ` [PATCH] perfcounters: " Peter Zijlstra
2009-08-07 10:58 ` Ingo Molnar
2009-08-07 20:09 ` Peter Zijlstra
2009-08-07 20:21 ` Frederic Weisbecker
2009-08-07 20:28 ` Peter Zijlstra
2009-08-07 20:36 ` Frederic Weisbecker
2009-08-07 20:26 ` Frederic Weisbecker
2009-08-07 15:54 ` Peter Zijlstra
2009-08-07 16:18 ` [tip:perfcounters/core] perfcounters: Support for ftrace event records sampling, fix modules tip-bot for Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox