From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755822Ab0EXT0V (ORCPT ); Mon, 24 May 2010 15:26:21 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:46467 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753047Ab0EXT0Q (ORCPT ); Mon, 24 May 2010 15:26:16 -0400 Date: Mon, 24 May 2010 21:25:31 +0200 From: Ingo Molnar To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Arnaldo Carvalho de Melo , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Steven Rostedt , Paul Mackerras , Thomas Gleixner , Andrew Morton Subject: [GIT PULL] perf changes Message-ID: <20100524192531.GA15614@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-08-17) X-ELTE-SpamScore: -2.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.5 -2.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus, Please pull the latest perf-core-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git perf-core-for-linus Most of these changes are fixes and dead code removal, but also a set of optimizations and a series of tracing size-optimization bits that we kept pending because they conflicted with the previous push - and some TUI enhancements. Thanks, Ingo ------------------> Akinobu Mita (1): x86/mmiotrace: Remove redundant instruction prefix checks Arnaldo Carvalho de Melo (15): perf tui: Fix build problem with slang <= 2.0.6 perf tools: Remove some unused functions perf probe: Fix some error exit paths perf probe: Don't call die() perf tools: remove xstrndup, xmalloc, xzalloc perf symbols: Don't try to read the build-id twice perf session: Make read_build_id routines look at the host_machine too perf TUI: Make 'space' be an alias to 'PgDn' perf annotate: Use build-ids to find the right DSO perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig perf report: Don't start the TUI if -D is used perf tui: Remove annotate from popup menu after failure perf annotate: Add TUI interface perf annotate: Fix up usage of the build id cache perf report: Support multiple events on the TUI Cyrill Gorcunov (5): perf, x86: P4 PMU -- handle unflagged events perf, x86: P4 PMU -- fix typo in unflagged NMI handling perf, x86: P4 PMU -- do a real check for ESCR address being in hash perf, x86: P4_pmu_schedule_events -- use smp_processor_id instead of raw_ perf, x86: P4 PMU -- add missing bit in CCCR mask Frederic Weisbecker (4): perf: Comply with new rcu checks API perf: Fix unaligned accesses while fetching trace values perf: Fix forgotten preempt_enable by nested writers perf: Fix getline undeclared Li Zefan (1): tracing: Fix function declarations if !CONFIG_STACKTRACE Lin Ming (1): perf, sparc: Implement group scheduling transactional APIs Masami Hiramatsu (1): perf probe: Don't compile CFI related code if elfutils is old Mathieu Desnoyers (1): tracepoints: Add check trace callback type Peter Zijlstra (16): perf/ftrace: Optimize perf/tracepoint interaction for single events perf: Optimize buffer placement by allocating buffers NUMA aware perf: Disallow mmap() on per-task inherited events perf: Optimize the perf_output() path by removing IRQ-disables perf: Optimize the hotpath by converting the perf output buffer to local_t perf: Optimize perf_output_*() by avoiding local_xchg() perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers perf-record: Remove -M perf-record: Share per-cpu buffers perf: Fix wakeup storm for RO mmap()s perf: Optimize perf_output_copy() perf: Optimize the !vmalloc backed buffer perf: Remove more code from the fastpath perf: Optimize perf_tp_event_match() Russ Anderson (1): perf record: remove unneeded gettimeofday() call Stephane Eranian (2): perf: Fix errors path in perf_output_begin() perf stat: add perf stat -B to pretty print large numbers Steven Rostedt (10): tracing: Create class struct for events tracing: Let tracepoints have data passed to tracepoint callbacks tracing: Remove per event trace registering tracing: Move fields from event to class structure tracing: Move raw_init from events to class tracing: Allow events to share their print functions tracing: Move print functions into event class tracing: Remove duplicate id information in event structure tracing: Combine event filter_active and enable into single flags field tracing: Comment the use of event_mutex with trace event flags Tom Zanussi (1): perf: Use read() instead of lseek() in trace_event_read.c:skip() arch/sparc/kernel/perf_event.c | 108 ++++--- arch/x86/include/asm/perf_event_p4.h | 3 +- arch/x86/kernel/cpu/perf_event_p4.c | 41 ++- arch/x86/mm/pf_in.c | 2 +- include/linux/ftrace_event.h | 103 ++++-- include/linux/perf_event.h | 28 +- include/linux/syscalls.h | 57 +-- include/linux/tracepoint.h | 98 ++++-- include/trace/ftrace.h | 247 +++++-------- include/trace/syscall.h | 10 +- kernel/perf_event.c | 389 +++++++++++--------- kernel/trace/blktrace.c | 138 ++++--- kernel/trace/ftrace.c | 7 +- kernel/trace/kmemtrace.c | 70 +++-- kernel/trace/trace.c | 9 +- kernel/trace/trace.h | 9 +- kernel/trace/trace_branch.c | 8 +- kernel/trace/trace_event_perf.c | 183 +++++----- kernel/trace/trace_events.c | 139 +++++--- kernel/trace/trace_events_filter.c | 28 +- kernel/trace/trace_export.c | 16 +- kernel/trace/trace_functions_graph.c | 13 +- kernel/trace/trace_kprobe.c | 113 ++++--- kernel/trace/trace_output.c | 137 +++++--- kernel/trace/trace_output.h | 2 +- kernel/trace/trace_sched_switch.c | 20 +- kernel/trace/trace_sched_wakeup.c | 28 +- kernel/trace/trace_syscalls.c | 146 ++++++-- kernel/trace/trace_workqueue.c | 26 +- kernel/tracepoint.c | 91 +++-- net/core/drop_monitor.c | 12 +- samples/tracepoints/tp-samples-trace.h | 4 +- samples/tracepoints/tracepoint-probe-sample.c | 13 +- samples/tracepoints/tracepoint-probe-sample2.c | 7 +- tools/perf/Documentation/perf-stat.txt | 3 + tools/perf/builtin-annotate.c | 61 +++- tools/perf/builtin-probe.c | 10 +- tools/perf/builtin-record.c | 72 ++--- tools/perf/builtin-report.c | 64 ++-- tools/perf/builtin-stat.c | 18 +- tools/perf/perf.c | 25 ++- tools/perf/util/abspath.c | 81 ----- tools/perf/util/build-id.c | 22 ++ tools/perf/util/build-id.h | 2 + tools/perf/util/cache.h | 57 +--- tools/perf/util/callchain.c | 1 + tools/perf/util/callchain.h | 1 - tools/perf/util/config.c | 461 +----------------------- tools/perf/util/exec_cmd.c | 6 +- tools/perf/util/exec_cmd.h | 1 - tools/perf/util/header.c | 84 +++-- tools/perf/util/help.c | 30 +-- tools/perf/util/hist.c | 42 ++- tools/perf/util/hist.h | 24 ++- tools/perf/util/newt.c | 150 ++++++-- tools/perf/util/path.c | 204 +----------- tools/perf/util/probe-finder.c | 33 ++- tools/perf/util/probe-finder.h | 3 + tools/perf/util/quote.c | 433 +---------------------- tools/perf/util/quote.h | 39 -- tools/perf/util/run-command.c | 90 ----- tools/perf/util/run-command.h | 30 -- tools/perf/util/session.c | 8 + tools/perf/util/session.h | 8 +- tools/perf/util/sigchain.c | 2 +- tools/perf/util/sigchain.h | 1 - tools/perf/util/strbuf.c | 229 +------------ tools/perf/util/strbuf.h | 45 --- tools/perf/util/symbol.c | 25 +- tools/perf/util/symbol.h | 1 + tools/perf/util/trace-event-read.c | 19 +- tools/perf/util/trace-event.h | 7 +- tools/perf/util/util.h | 177 +--------- tools/perf/util/wrapper.c | 110 ------ 74 files changed, 1839 insertions(+), 3145 deletions(-) diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c index e277193..cf4ce26 100644 --- a/arch/sparc/kernel/perf_event.c +++ b/arch/sparc/kernel/perf_event.c @@ -91,6 +91,8 @@ struct cpu_hw_events { /* Enabled/disable state. */ int enabled; + + unsigned int group_flag; }; DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = { .enabled = 1, }; @@ -980,53 +982,6 @@ static int collect_events(struct perf_event *group, int max_count, return n; } -static void event_sched_in(struct perf_event *event) -{ - event->state = PERF_EVENT_STATE_ACTIVE; - event->oncpu = smp_processor_id(); - event->tstamp_running += event->ctx->time - event->tstamp_stopped; - if (is_software_event(event)) - event->pmu->enable(event); -} - -int hw_perf_group_sched_in(struct perf_event *group_leader, - struct perf_cpu_context *cpuctx, - struct perf_event_context *ctx) -{ - struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); - struct perf_event *sub; - int n0, n; - - if (!sparc_pmu) - return 0; - - n0 = cpuc->n_events; - n = collect_events(group_leader, perf_max_events - n0, - &cpuc->event[n0], &cpuc->events[n0], - &cpuc->current_idx[n0]); - if (n < 0) - return -EAGAIN; - if (check_excludes(cpuc->event, n0, n)) - return -EINVAL; - if (sparc_check_constraints(cpuc->event, cpuc->events, n + n0)) - return -EAGAIN; - cpuc->n_events = n0 + n; - cpuc->n_added += n; - - cpuctx->active_oncpu += n; - n = 1; - event_sched_in(group_leader); - list_for_each_entry(sub, &group_leader->sibling_list, group_entry) { - if (sub->state != PERF_EVENT_STATE_OFF) { - event_sched_in(sub); - n++; - } - } - ctx->nr_active += n; - - return 1; -} - static int sparc_pmu_enable(struct perf_event *event) { struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); @@ -1044,11 +999,20 @@ static int sparc_pmu_enable(struct perf_event *event) cpuc->events[n0] = event->hw.event_base; cpuc->current_idx[n0] = PIC_NO_INDEX; + /* + * If group events scheduling transaction was started, + * skip the schedulability test here, it will be peformed + * at commit time(->commit_txn) as a whole + */ + if (cpuc->group_flag & PERF_EVENT_TXN_STARTED) + goto nocheck; + if (check_excludes(cpuc->event, n0, 1)) goto out; if (sparc_check_constraints(cpuc->event, cpuc->events, n0 + 1)) goto out; +nocheck: cpuc->n_events++; cpuc->n_added++; @@ -1128,11 +1092,61 @@ static int __hw_perf_event_init(struct perf_event *event) return 0; } +/* + * Start group events scheduling transaction + * Set the flag to make pmu::enable() not perform the + * schedulability test, it will be performed at commit time + */ +static void sparc_pmu_start_txn(const struct pmu *pmu) +{ + struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events); + + cpuhw->group_flag |= PERF_EVENT_TXN_STARTED; +} + +/* + * Stop group events scheduling transaction + * Clear the flag and pmu::enable() will perform the + * schedulability test. + */ +static void sparc_pmu_cancel_txn(const struct pmu *pmu) +{ + struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events); + + cpuhw->group_flag &= ~PERF_EVENT_TXN_STARTED; +} + +/* + * Commit group events scheduling transaction + * Perform the group schedulability test as a whole + * Return 0 if success + */ +static int sparc_pmu_commit_txn(const struct pmu *pmu) +{ + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); + int n; + + if (!sparc_pmu) + return -EINVAL; + + cpuc = &__get_cpu_var(cpu_hw_events); + n = cpuc->n_events; + if (check_excludes(cpuc->event, 0, n)) + return -EINVAL; + if (sparc_check_constraints(cpuc->event, cpuc->events, n)) + return -EAGAIN; + + return 0; +} + static const struct pmu pmu = { .enable = sparc_pmu_enable, .disable = sparc_pmu_disable, .read = sparc_pmu_read, .unthrottle = sparc_pmu_unthrottle, + .start_txn = sparc_pmu_start_txn, + .cancel_txn = sparc_pmu_cancel_txn, + .commit_txn = sparc_pmu_commit_txn, }; const struct pmu *hw_perf_event_init(struct perf_event *event) diff --git a/arch/x86/include/asm/perf_event_p4.h b/arch/x86/include/asm/perf_event_p4.h index b05400a..64a8ebf 100644 --- a/arch/x86/include/asm/perf_event_p4.h +++ b/arch/x86/include/asm/perf_event_p4.h @@ -89,7 +89,8 @@ P4_CCCR_ENABLE) /* HT mask */ -#define P4_CCCR_MASK_HT (P4_CCCR_MASK | P4_CCCR_THREAD_ANY) +#define P4_CCCR_MASK_HT \ + (P4_CCCR_MASK | P4_CCCR_OVF_PMI_T1 | P4_CCCR_THREAD_ANY) #define P4_GEN_ESCR_EMASK(class, name, bit) \ class##__##name = ((1 << bit) << P4_ESCR_EVENTMASK_SHIFT) diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c index 424fc8d..ae85d69 100644 --- a/arch/x86/kernel/cpu/perf_event_p4.c +++ b/arch/x86/kernel/cpu/perf_event_p4.c @@ -465,15 +465,21 @@ out: return rc; } -static inline void p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc) +static inline int p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc) { - unsigned long dummy; + int overflow = 0; + u32 low, high; - rdmsrl(hwc->config_base + hwc->idx, dummy); - if (dummy & P4_CCCR_OVF) { + rdmsr(hwc->config_base + hwc->idx, low, high); + + /* we need to check high bit for unflagged overflows */ + if ((low & P4_CCCR_OVF) || !(high & (1 << 31))) { + overflow = 1; (void)checking_wrmsrl(hwc->config_base + hwc->idx, - ((u64)dummy) & ~P4_CCCR_OVF); + ((u64)low) & ~P4_CCCR_OVF); } + + return overflow; } static inline void p4_pmu_disable_event(struct perf_event *event) @@ -584,21 +590,15 @@ static int p4_pmu_handle_irq(struct pt_regs *regs) WARN_ON_ONCE(hwc->idx != idx); - /* - * FIXME: Redundant call, actually not needed - * but just to check if we're screwed - */ - p4_pmu_clear_cccr_ovf(hwc); + /* it might be unflagged overflow */ + handled = p4_pmu_clear_cccr_ovf(hwc); val = x86_perf_event_update(event); - if (val & (1ULL << (x86_pmu.cntval_bits - 1))) + if (!handled && (val & (1ULL << (x86_pmu.cntval_bits - 1)))) continue; - /* - * event overflow - */ - handled = 1; - data.period = event->hw.last_period; + /* event overflow for sure */ + data.period = event->hw.last_period; if (!x86_perf_event_set_period(event)) continue; @@ -670,7 +670,7 @@ static void p4_pmu_swap_config_ts(struct hw_perf_event *hwc, int cpu) /* * ESCR address hashing is tricky, ESCRs are not sequential - * in memory but all starts from MSR_P4_BSU_ESCR0 (0x03e0) and + * in memory but all starts from MSR_P4_BSU_ESCR0 (0x03a0) and * the metric between any ESCRs is laid in range [0xa0,0xe1] * * so we make ~70% filled hashtable @@ -735,8 +735,9 @@ static int p4_get_escr_idx(unsigned int addr) { unsigned int idx = P4_ESCR_MSR_IDX(addr); - if (unlikely(idx >= P4_ESCR_MSR_TABLE_SIZE || - !p4_escr_table[idx])) { + if (unlikely(idx >= P4_ESCR_MSR_TABLE_SIZE || + !p4_escr_table[idx] || + p4_escr_table[idx] != addr)) { WARN_ONCE(1, "P4 PMU: Wrong address passed: %x\n", addr); return -1; } @@ -762,7 +763,7 @@ static int p4_pmu_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign { unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)]; unsigned long escr_mask[BITS_TO_LONGS(P4_ESCR_MSR_TABLE_SIZE)]; - int cpu = raw_smp_processor_id(); + int cpu = smp_processor_id(); struct hw_perf_event *hwc; struct p4_event_bind *bind; unsigned int i, thread, num; diff --git a/arch/x86/mm/pf_in.c b/arch/x86/mm/pf_in.c index df3d5c8..308e325 100644 --- a/arch/x86/mm/pf_in.c +++ b/arch/x86/mm/pf_in.c @@ -34,7 +34,7 @@ /* IA32 Manual 3, 2-1 */ static unsigned char prefix_codes[] = { 0xF0, 0xF2, 0xF3, 0x2E, 0x36, 0x3E, 0x26, 0x64, - 0x65, 0x2E, 0x3E, 0x66, 0x67 + 0x65, 0x66, 0x67 }; /* IA32 Manual 3, 3-432*/ static unsigned int reg_rop[] = { diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 39e71b0..ee8a841 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -70,18 +70,25 @@ struct trace_iterator { }; +struct trace_event; + typedef enum print_line_t (*trace_print_func)(struct trace_iterator *iter, - int flags); -struct trace_event { - struct hlist_node node; - struct list_head list; - int type; + int flags, struct trace_event *event); + +struct trace_event_functions { trace_print_func trace; trace_print_func raw; trace_print_func hex; trace_print_func binary; }; +struct trace_event { + struct hlist_node node; + struct list_head list; + int type; + struct trace_event_functions *funcs; +}; + extern int register_ftrace_event(struct trace_event *event); extern int unregister_ftrace_event(struct trace_event *event); @@ -113,28 +120,70 @@ void tracing_record_cmdline(struct task_struct *tsk); struct event_filter; +enum trace_reg { + TRACE_REG_REGISTER, + TRACE_REG_UNREGISTER, + TRACE_REG_PERF_REGISTER, + TRACE_REG_PERF_UNREGISTER, +}; + +struct ftrace_event_call; + +struct ftrace_event_class { + char *system; + void *probe; +#ifdef CONFIG_PERF_EVENTS + void *perf_probe; +#endif + int (*reg)(struct ftrace_event_call *event, + enum trace_reg type); + int (*define_fields)(struct ftrace_event_call *); + struct list_head *(*get_fields)(struct ftrace_event_call *); + struct list_head fields; + int (*raw_init)(struct ftrace_event_call *); +}; + +enum { + TRACE_EVENT_FL_ENABLED_BIT, + TRACE_EVENT_FL_FILTERED_BIT, +}; + +enum { + TRACE_EVENT_FL_ENABLED = (1 << TRACE_EVENT_FL_ENABLED_BIT), + TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT), +}; + struct ftrace_event_call { struct list_head list; + struct ftrace_event_class *class; char *name; - char *system; struct dentry *dir; - struct trace_event *event; - int enabled; - int (*regfunc)(struct ftrace_event_call *); - void (*unregfunc)(struct ftrace_event_call *); - int id; + struct trace_event event; const char *print_fmt; - int (*raw_init)(struct ftrace_event_call *); - int (*define_fields)(struct ftrace_event_call *); - struct list_head fields; - int filter_active; struct event_filter *filter; void *mod; void *data; + /* + * 32 bit flags: + * bit 1: enabled + * bit 2: filter_active + * + * Changes to flags must hold the event_mutex. + * + * Note: Reads of flags do not hold the event_mutex since + * they occur in critical sections. But the way flags + * is currently used, these changes do no affect the code + * except that when a change is made, it may have a slight + * delay in propagating the changes to other CPUs due to + * caching and such. + */ + unsigned int flags; + +#ifdef CONFIG_PERF_EVENTS int perf_refcount; - int (*perf_event_enable)(struct ftrace_event_call *); - void (*perf_event_disable)(struct ftrace_event_call *); + struct hlist_head *perf_events; +#endif }; #define PERF_MAX_TRACE_SIZE 2048 @@ -191,24 +240,22 @@ struct perf_event; DECLARE_PER_CPU(struct pt_regs, perf_trace_regs); -extern int perf_trace_enable(int event_id); -extern void perf_trace_disable(int event_id); -extern int ftrace_profile_set_filter(struct perf_event *event, int event_id, +extern int perf_trace_init(struct perf_event *event); +extern void perf_trace_destroy(struct perf_event *event); +extern int perf_trace_enable(struct perf_event *event); +extern void perf_trace_disable(struct perf_event *event); +extern int ftrace_profile_set_filter(struct perf_event *event, int event_id, char *filter_str); extern void ftrace_profile_free_filter(struct perf_event *event); -extern void * -perf_trace_buf_prepare(int size, unsigned short type, int *rctxp, - unsigned long *irq_flags); +extern void *perf_trace_buf_prepare(int size, unsigned short type, + struct pt_regs *regs, int *rctxp); static inline void perf_trace_buf_submit(void *raw_data, int size, int rctx, u64 addr, - u64 count, unsigned long irq_flags, struct pt_regs *regs) + u64 count, struct pt_regs *regs, void *head) { - struct trace_entry *entry = raw_data; - - perf_tp_event(entry->type, addr, count, raw_data, size, regs); + perf_tp_event(addr, count, raw_data, size, regs, head); perf_swevent_put_recursion_context(rctx); - local_irq_restore(irq_flags); } #endif diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 3fd5c82..fb6c91e 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -485,6 +485,7 @@ struct perf_guest_info_callbacks { #include #include #include +#include #define PERF_MAX_STACK_DEPTH 255 @@ -587,21 +588,19 @@ struct perf_mmap_data { struct rcu_head rcu_head; #ifdef CONFIG_PERF_USE_VMALLOC struct work_struct work; + int page_order; /* allocation order */ #endif - int data_order; int nr_pages; /* nr of data pages */ int writable; /* are we writable */ int nr_locked; /* nr pages mlocked */ atomic_t poll; /* POLL_ for wakeups */ - atomic_t events; /* event_id limit */ - atomic_long_t head; /* write position */ - atomic_long_t done_head; /* completed head */ - - atomic_t lock; /* concurrent writes */ - atomic_t wakeup; /* needs a wakeup */ - atomic_t lost; /* nr records lost */ + local_t head; /* write position */ + local_t nest; /* nested writers */ + local_t events; /* event limit */ + local_t wakeup; /* wakeup stamp */ + local_t lost; /* nr records lost */ long watermark; /* wakeup watermark */ @@ -728,6 +727,7 @@ struct perf_event { perf_overflow_handler_t overflow_handler; #ifdef CONFIG_EVENT_TRACING + struct ftrace_event_call *tp_event; struct event_filter *filter; #endif @@ -803,11 +803,12 @@ struct perf_cpu_context { struct perf_output_handle { struct perf_event *event; struct perf_mmap_data *data; - unsigned long head; - unsigned long offset; + unsigned long wakeup; + unsigned long size; + void *addr; + int page; int nmi; int sample; - int locked; }; #ifdef CONFIG_PERF_EVENTS @@ -993,8 +994,9 @@ static inline bool perf_paranoid_kernel(void) } extern void perf_event_init(void); -extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record, - int entry_size, struct pt_regs *regs); +extern void perf_tp_event(u64 addr, u64 count, void *record, + int entry_size, struct pt_regs *regs, + struct hlist_head *head); extern void perf_bp_event(struct perf_event *event, void *data); #ifndef perf_misc_flags diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 057929b..a1a86a5 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -103,22 +103,6 @@ struct perf_event_attr; #define __SC_TEST5(t5, a5, ...) __SC_TEST(t5); __SC_TEST4(__VA_ARGS__) #define __SC_TEST6(t6, a6, ...) __SC_TEST(t6); __SC_TEST5(__VA_ARGS__) -#ifdef CONFIG_PERF_EVENTS - -#define TRACE_SYS_ENTER_PERF_INIT(sname) \ - .perf_event_enable = perf_sysenter_enable, \ - .perf_event_disable = perf_sysenter_disable, - -#define TRACE_SYS_EXIT_PERF_INIT(sname) \ - .perf_event_enable = perf_sysexit_enable, \ - .perf_event_disable = perf_sysexit_disable, -#else -#define TRACE_SYS_ENTER_PERF(sname) -#define TRACE_SYS_ENTER_PERF_INIT(sname) -#define TRACE_SYS_EXIT_PERF(sname) -#define TRACE_SYS_EXIT_PERF_INIT(sname) -#endif /* CONFIG_PERF_EVENTS */ - #ifdef CONFIG_FTRACE_SYSCALLS #define __SC_STR_ADECL1(t, a) #a #define __SC_STR_ADECL2(t, a, ...) #a, __SC_STR_ADECL1(__VA_ARGS__) @@ -134,54 +118,43 @@ struct perf_event_attr; #define __SC_STR_TDECL5(t, a, ...) #t, __SC_STR_TDECL4(__VA_ARGS__) #define __SC_STR_TDECL6(t, a, ...) #t, __SC_STR_TDECL5(__VA_ARGS__) +extern struct ftrace_event_class event_class_syscall_enter; +extern struct ftrace_event_class event_class_syscall_exit; +extern struct trace_event_functions enter_syscall_print_funcs; +extern struct trace_event_functions exit_syscall_print_funcs; + #define SYSCALL_TRACE_ENTER_EVENT(sname) \ - static const struct syscall_metadata __syscall_meta_##sname; \ + static struct syscall_metadata __syscall_meta_##sname; \ static struct ftrace_event_call \ __attribute__((__aligned__(4))) event_enter_##sname; \ - static struct trace_event enter_syscall_print_##sname = { \ - .trace = print_syscall_enter, \ - }; \ static struct ftrace_event_call __used \ __attribute__((__aligned__(4))) \ __attribute__((section("_ftrace_events"))) \ event_enter_##sname = { \ .name = "sys_enter"#sname, \ - .system = "syscalls", \ - .event = &enter_syscall_print_##sname, \ - .raw_init = init_syscall_trace, \ - .define_fields = syscall_enter_define_fields, \ - .regfunc = reg_event_syscall_enter, \ - .unregfunc = unreg_event_syscall_enter, \ + .class = &event_class_syscall_enter, \ + .event.funcs = &enter_syscall_print_funcs, \ .data = (void *)&__syscall_meta_##sname,\ - TRACE_SYS_ENTER_PERF_INIT(sname) \ } #define SYSCALL_TRACE_EXIT_EVENT(sname) \ - static const struct syscall_metadata __syscall_meta_##sname; \ + static struct syscall_metadata __syscall_meta_##sname; \ static struct ftrace_event_call \ __attribute__((__aligned__(4))) event_exit_##sname; \ - static struct trace_event exit_syscall_print_##sname = { \ - .trace = print_syscall_exit, \ - }; \ static struct ftrace_event_call __used \ __attribute__((__aligned__(4))) \ __attribute__((section("_ftrace_events"))) \ event_exit_##sname = { \ .name = "sys_exit"#sname, \ - .system = "syscalls", \ - .event = &exit_syscall_print_##sname, \ - .raw_init = init_syscall_trace, \ - .define_fields = syscall_exit_define_fields, \ - .regfunc = reg_event_syscall_exit, \ - .unregfunc = unreg_event_syscall_exit, \ + .class = &event_class_syscall_exit, \ + .event.funcs = &exit_syscall_print_funcs, \ .data = (void *)&__syscall_meta_##sname,\ - TRACE_SYS_EXIT_PERF_INIT(sname) \ } #define SYSCALL_METADATA(sname, nb) \ SYSCALL_TRACE_ENTER_EVENT(sname); \ SYSCALL_TRACE_EXIT_EVENT(sname); \ - static const struct syscall_metadata __used \ + static struct syscall_metadata __used \ __attribute__((__aligned__(4))) \ __attribute__((section("__syscalls_metadata"))) \ __syscall_meta_##sname = { \ @@ -191,12 +164,14 @@ struct perf_event_attr; .args = args_##sname, \ .enter_event = &event_enter_##sname, \ .exit_event = &event_exit_##sname, \ + .enter_fields = LIST_HEAD_INIT(__syscall_meta_##sname.enter_fields), \ + .exit_fields = LIST_HEAD_INIT(__syscall_meta_##sname.exit_fields), \ }; #define SYSCALL_DEFINE0(sname) \ SYSCALL_TRACE_ENTER_EVENT(_##sname); \ SYSCALL_TRACE_EXIT_EVENT(_##sname); \ - static const struct syscall_metadata __used \ + static struct syscall_metadata __used \ __attribute__((__aligned__(4))) \ __attribute__((section("__syscalls_metadata"))) \ __syscall_meta__##sname = { \ @@ -204,6 +179,8 @@ struct perf_event_attr; .nb_args = 0, \ .enter_event = &event_enter__##sname, \ .exit_event = &event_exit__##sname, \ + .enter_fields = LIST_HEAD_INIT(__syscall_meta__##sname.enter_fields), \ + .exit_fields = LIST_HEAD_INIT(__syscall_meta__##sname.exit_fields), \ }; \ asmlinkage long sys_##sname(void) #else diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 1d85f9a..9a59d1f 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -20,12 +20,17 @@ struct module; struct tracepoint; +struct tracepoint_func { + void *func; + void *data; +}; + struct tracepoint { const char *name; /* Tracepoint name */ int state; /* State. */ void (*regfunc)(void); void (*unregfunc)(void); - void **funcs; + struct tracepoint_func *funcs; } __attribute__((aligned(32))); /* * Aligned on 32 bytes because it is * globally visible and gcc happily @@ -37,16 +42,19 @@ struct tracepoint { * Connect a probe to a tracepoint. * Internal API, should not be used directly. */ -extern int tracepoint_probe_register(const char *name, void *probe); +extern int tracepoint_probe_register(const char *name, void *probe, void *data); /* * Disconnect a probe from a tracepoint. * Internal API, should not be used directly. */ -extern int tracepoint_probe_unregister(const char *name, void *probe); +extern int +tracepoint_probe_unregister(const char *name, void *probe, void *data); -extern int tracepoint_probe_register_noupdate(const char *name, void *probe); -extern int tracepoint_probe_unregister_noupdate(const char *name, void *probe); +extern int tracepoint_probe_register_noupdate(const char *name, void *probe, + void *data); +extern int tracepoint_probe_unregister_noupdate(const char *name, void *probe, + void *data); extern void tracepoint_probe_update_all(void); struct tracepoint_iter { @@ -102,17 +110,27 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin, /* * it_func[0] is never NULL because there is at least one element in the array * when the array itself is non NULL. + * + * Note, the proto and args passed in includes "__data" as the first parameter. + * The reason for this is to handle the "void" prototype. If a tracepoint + * has a "void" prototype, then it is invalid to declare a function + * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just + * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto". */ #define __DO_TRACE(tp, proto, args) \ do { \ - void **it_func; \ + struct tracepoint_func *it_func_ptr; \ + void *it_func; \ + void *__data; \ \ rcu_read_lock_sched_notrace(); \ - it_func = rcu_dereference_sched((tp)->funcs); \ - if (it_func) { \ + it_func_ptr = rcu_dereference_sched((tp)->funcs); \ + if (it_func_ptr) { \ do { \ - ((void(*)(proto))(*it_func))(args); \ - } while (*(++it_func)); \ + it_func = (it_func_ptr)->func; \ + __data = (it_func_ptr)->data; \ + ((void(*)(proto))(it_func))(args); \ + } while ((++it_func_ptr)->func); \ } \ rcu_read_unlock_sched_notrace(); \ } while (0) @@ -122,24 +140,32 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin, * not add unwanted padding between the beginning of the section and the * structure. Force alignment to the same alignment as the section start. */ -#define DECLARE_TRACE(name, proto, args) \ +#define __DECLARE_TRACE(name, proto, args, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (unlikely(__tracepoint_##name.state)) \ __DO_TRACE(&__tracepoint_##name, \ - TP_PROTO(proto), TP_ARGS(args)); \ + TP_PROTO(data_proto), \ + TP_ARGS(data_args)); \ + } \ + static inline int \ + register_trace_##name(void (*probe)(data_proto), void *data) \ + { \ + return tracepoint_probe_register(#name, (void *)probe, \ + data); \ } \ - static inline int register_trace_##name(void (*probe)(proto)) \ + static inline int \ + unregister_trace_##name(void (*probe)(data_proto), void *data) \ { \ - return tracepoint_probe_register(#name, (void *)probe); \ + return tracepoint_probe_unregister(#name, (void *)probe, \ + data); \ } \ - static inline int unregister_trace_##name(void (*probe)(proto)) \ + static inline void \ + check_trace_callback_type_##name(void (*cb)(data_proto)) \ { \ - return tracepoint_probe_unregister(#name, (void *)probe);\ } - #define DEFINE_TRACE_FN(name, reg, unreg) \ static const char __tpstrtab_##name[] \ __attribute__((section("__tracepoints_strings"))) = #name; \ @@ -156,18 +182,23 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin, EXPORT_SYMBOL(__tracepoint_##name) #else /* !CONFIG_TRACEPOINTS */ -#define DECLARE_TRACE(name, proto, args) \ - static inline void _do_trace_##name(struct tracepoint *tp, proto) \ - { } \ +#define __DECLARE_TRACE(name, proto, args, data_proto, data_args) \ static inline void trace_##name(proto) \ { } \ - static inline int register_trace_##name(void (*probe)(proto)) \ + static inline int \ + register_trace_##name(void (*probe)(data_proto), \ + void *data) \ { \ return -ENOSYS; \ } \ - static inline int unregister_trace_##name(void (*probe)(proto)) \ + static inline int \ + unregister_trace_##name(void (*probe)(data_proto), \ + void *data) \ { \ return -ENOSYS; \ + } \ + static inline void check_trace_callback_type_##name(void (*cb)(data_proto)) \ + { \ } #define DEFINE_TRACE_FN(name, reg, unreg) @@ -176,6 +207,29 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin, #define EXPORT_TRACEPOINT_SYMBOL(name) #endif /* CONFIG_TRACEPOINTS */ + +/* + * The need for the DECLARE_TRACE_NOARGS() is to handle the prototype + * (void). "void" is a special value in a function prototype and can + * not be combined with other arguments. Since the DECLARE_TRACE() + * macro adds a data element at the beginning of the prototype, + * we need a way to differentiate "(void *data, proto)" from + * "(void *data, void)". The second prototype is invalid. + * + * DECLARE_TRACE_NOARGS() passes "void" as the tracepoint prototype + * and "void *__data" as the callback prototype. + * + * DECLARE_TRACE() passes "proto" as the tracepoint protoype and + * "void *__data, proto" as the callback prototype. + */ +#define DECLARE_TRACE_NOARGS(name) \ + __DECLARE_TRACE(name, void, , void *__data, __data) + +#define DECLARE_TRACE(name, proto, args) \ + __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \ + PARAMS(void *__data, proto), \ + PARAMS(__data, args)) + #endif /* DECLARE_TRACE */ #ifndef TRACE_EVENT diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 16253db..0152b86 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -62,7 +62,10 @@ struct trace_entry ent; \ tstruct \ char __data[0]; \ - }; + }; \ + \ + static struct ftrace_event_class event_class_##name; + #undef DEFINE_EVENT #define DEFINE_EVENT(template, name, proto, args) \ static struct ftrace_event_call \ @@ -147,7 +150,7 @@ * * entry = iter->ent; * - * if (entry->type != event_.id) { + * if (entry->type != event_->event.type) { * WARN_ON_ONCE(1); * return TRACE_TYPE_UNHANDLED; * } @@ -203,18 +206,22 @@ #undef DECLARE_EVENT_CLASS #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ static notrace enum print_line_t \ -ftrace_raw_output_id_##call(int event_id, const char *name, \ - struct trace_iterator *iter, int flags) \ +ftrace_raw_output_##call(struct trace_iterator *iter, int flags, \ + struct trace_event *trace_event) \ { \ + struct ftrace_event_call *event; \ struct trace_seq *s = &iter->seq; \ struct ftrace_raw_##call *field; \ struct trace_entry *entry; \ struct trace_seq *p; \ int ret; \ \ + event = container_of(trace_event, struct ftrace_event_call, \ + event); \ + \ entry = iter->ent; \ \ - if (entry->type != event_id) { \ + if (entry->type != event->event.type) { \ WARN_ON_ONCE(1); \ return TRACE_TYPE_UNHANDLED; \ } \ @@ -223,7 +230,7 @@ ftrace_raw_output_id_##call(int event_id, const char *name, \ \ p = &get_cpu_var(ftrace_event_seq); \ trace_seq_init(p); \ - ret = trace_seq_printf(s, "%s: ", name); \ + ret = trace_seq_printf(s, "%s: ", event->name); \ if (ret) \ ret = trace_seq_printf(s, print); \ put_cpu(); \ @@ -231,21 +238,16 @@ ftrace_raw_output_id_##call(int event_id, const char *name, \ return TRACE_TYPE_PARTIAL_LINE; \ \ return TRACE_TYPE_HANDLED; \ -} - -#undef DEFINE_EVENT -#define DEFINE_EVENT(template, name, proto, args) \ -static notrace enum print_line_t \ -ftrace_raw_output_##name(struct trace_iterator *iter, int flags) \ -{ \ - return ftrace_raw_output_id_##template(event_##name.id, \ - #name, iter, flags); \ -} +} \ +static struct trace_event_functions ftrace_event_type_funcs_##call = { \ + .trace = ftrace_raw_output_##call, \ +}; #undef DEFINE_EVENT_PRINT #define DEFINE_EVENT_PRINT(template, call, proto, args, print) \ static notrace enum print_line_t \ -ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \ +ftrace_raw_output_##call(struct trace_iterator *iter, int flags, \ + struct trace_event *event) \ { \ struct trace_seq *s = &iter->seq; \ struct ftrace_raw_##template *field; \ @@ -255,7 +257,7 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \ \ entry = iter->ent; \ \ - if (entry->type != event_##call.id) { \ + if (entry->type != event_##call.event.type) { \ WARN_ON_ONCE(1); \ return TRACE_TYPE_UNHANDLED; \ } \ @@ -272,7 +274,10 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \ return TRACE_TYPE_PARTIAL_LINE; \ \ return TRACE_TYPE_HANDLED; \ -} +} \ +static struct trace_event_functions ftrace_event_type_funcs_##call = { \ + .trace = ftrace_raw_output_##call, \ +}; #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) @@ -378,80 +383,18 @@ static inline notrace int ftrace_get_offsets_##call( \ #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) -#ifdef CONFIG_PERF_EVENTS - -/* - * Generate the functions needed for tracepoint perf_event support. - * - * NOTE: The insertion profile callback (ftrace_profile_) is defined later - * - * static int ftrace_profile_enable_(void) - * { - * return register_trace_(ftrace_profile_); - * } - * - * static void ftrace_profile_disable_(void) - * { - * unregister_trace_(ftrace_profile_); - * } - * - */ - -#undef DECLARE_EVENT_CLASS -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) - -#undef DEFINE_EVENT -#define DEFINE_EVENT(template, name, proto, args) \ - \ -static void perf_trace_##name(proto); \ - \ -static notrace int \ -perf_trace_enable_##name(struct ftrace_event_call *unused) \ -{ \ - return register_trace_##name(perf_trace_##name); \ -} \ - \ -static notrace void \ -perf_trace_disable_##name(struct ftrace_event_call *unused) \ -{ \ - unregister_trace_##name(perf_trace_##name); \ -} - -#undef DEFINE_EVENT_PRINT -#define DEFINE_EVENT_PRINT(template, name, proto, args, print) \ - DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args)) - -#include TRACE_INCLUDE(TRACE_INCLUDE_FILE) - -#endif /* CONFIG_PERF_EVENTS */ - /* * Stage 4 of the trace events. * * Override the macros in to include the following: * - * static void ftrace_event_(proto) - * { - * event_trace_printk(_RET_IP_, ": " ); - * } - * - * static int ftrace_reg_event_(struct ftrace_event_call *unused) - * { - * return register_trace_(ftrace_event_); - * } - * - * static void ftrace_unreg_event_(struct ftrace_event_call *unused) - * { - * unregister_trace_(ftrace_event_); - * } - * - * * For those macros defined with TRACE_EVENT: * * static struct ftrace_event_call event_; * - * static void ftrace_raw_event_(proto) + * static void ftrace_raw_event_(void *__data, proto) * { + * struct ftrace_event_call *event_call = __data; * struct ftrace_data_offsets_ __maybe_unused __data_offsets; * struct ring_buffer_event *event; * struct ftrace_raw_ *entry; <-- defined in stage 1 @@ -466,7 +409,7 @@ perf_trace_disable_##name(struct ftrace_event_call *unused) \ * __data_size = ftrace_get_offsets_(&__data_offsets, args); * * event = trace_current_buffer_lock_reserve(&buffer, - * event_.id, + * event_->event.type, * sizeof(*entry) + __data_size, * irq_flags, pc); * if (!event) @@ -481,43 +424,42 @@ perf_trace_disable_##name(struct ftrace_event_call *unused) \ * event, irq_flags, pc); * } * - * static int ftrace_raw_reg_event_(struct ftrace_event_call *unused) - * { - * return register_trace_(ftrace_raw_event_); - * } - * - * static void ftrace_unreg_event_(struct ftrace_event_call *unused) - * { - * unregister_trace_(ftrace_raw_event_); - * } - * * static struct trace_event ftrace_event_type_ = { * .trace = ftrace_raw_output_, <-- stage 2 * }; * * static const char print_fmt_[] = ; * + * static struct ftrace_event_class __used event_class_