* perf, x86: Add last TSX PMU code for Haswell
@ 2013-09-13 18:08 Andi Kleen
2013-09-13 18:08 ` [PATCH 1/6] perf, core: Add generic transaction flags v4 Andi Kleen
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Andi Kleen @ 2013-09-13 18:08 UTC (permalink / raw)
To: mingo; +Cc: peterz, acme, linux-kernel, eranian
[This has kernel and user parts, so will need
review/ack/merges from both perf kernel and user land maintainers]
This is currently the last part of the TSX PMU code,
just adding the left over bits:
This adds some changes to the user interfaces.
I'll send patches for the manpage separately.
- Report the transaction abort flags to user space
using a new field, and add the code to display them.
This is used to classify abort types, also fairly
important for tuning as it guides the tuning process,
together with the abort weight that was added earleir.
[3 patches, generic, x86, user tools]
- Add support for reporting the two new TSX LBR flags: in_tx
and abort_tx. The code to handle the LBRs was already
added earlier, this just adds the code to report,
filter and display them.
- Add a workaround for a Haswell issue that it reports
an extra LBR record for every abort. We just filter
those out in the kernel.
Open perf TSX issues left:
- Revisit automatic enabling of precise for tx/el-abort
- Need to fix the sort handling in the user tools
to actually sort on other fields
- The aggregated LBR display in the user tools is not
very useful for transactions, need a way to report them
in a histogram like backtraces.
- May want some short cut options for
record --transaction --weight / report --sort symbol,transaction,weight
-Andi
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 1/6] perf, core: Add generic transaction flags v4 2013-09-13 18:08 perf, x86: Add last TSX PMU code for Haswell Andi Kleen @ 2013-09-13 18:08 ` Andi Kleen 2013-09-16 10:58 ` Peter Zijlstra 2013-09-16 11:04 ` Peter Zijlstra 2013-09-13 18:08 ` [PATCH 2/6] perf, x86: Add Haswell specific transaction flag reporting v4 Andi Kleen ` (4 subsequent siblings) 5 siblings, 2 replies; 12+ messages in thread From: Andi Kleen @ 2013-09-13 18:08 UTC (permalink / raw) To: mingo; +Cc: peterz, acme, linux-kernel, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Add a generic qualifier for transaction events, as a new sample type that returns a flag word. This is particularly useful for qualifying aborts: to distinguish aborts which happen due to asynchronous events (like conflicts caused by another CPU) versus instructions that lead to an abort. The tuning strategies are very different for those cases, so it's important to distinguish them easily and early. Since it's inconvenient and inflexible to filter for this in the kernel we report all the events out and allow some post processing in user space. The flags are based on the Intel TSX events, but should be fairly generic and mostly applicable to other HTM architectures too. In addition to various flag words there's also reserved space to report an program supplied abort code. For TSX this is used to distinguish specific classes of aborts, like a lock busy abort when doing lock elision. Flags: Elision and generic transactions (ELISION vs TRANSACTION) (HLE vs RTM on TSX; IBM etc. would likely only use TRANSACTION) Aborts caused by current thread vs aborts caused by others (SYNC vs ASYNC) Retryable transaction (RETRY) Conflicts with other threads (CONFLICT) Transaction write capacity overflow (CAPACITY WRITE) Transaction read capacity overflow (CAPACITY READ) Transactions implicitely aborted can also return an abort code. This can be used to signal specific events to the profiler. A common case is abort on lock busy in a RTM eliding library (code 0xff) To handle this case we include the TSX abort code Common example aborts in TSX would be: - Data conflict with another thread on memory read. Flags: TRANSACTION|ASYNC|CONFLICT - executing a WRMSR in a transaction. Flags: TRANSACTION|SYNC - HLE transaction in user space is too large Flags: ELISION|SYNC|CAPACITY-WRITE The only flag that is somewhat TSX specific is ELISION. This adds the perf core glue needed for reporting the new flag word out. v2: Add MEM/MISC v3: Move transaction to the end v4: Separate capacity-read/write and remove misc Signed-off-by: Andi Kleen <ak@linux.intel.com> --- include/linux/perf_event.h | 5 +++++ include/uapi/linux/perf_event.h | 25 ++++++++++++++++++++++++- kernel/events/core.c | 6 ++++++ 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 866e85c..82dba57 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -562,6 +562,10 @@ struct perf_sample_data { struct perf_regs_user regs_user; u64 stack_user_size; u64 weight; + /* + * Transaction flags for abort events: + */ + u64 transaction; }; static inline void perf_sample_data_init(struct perf_sample_data *data, @@ -577,6 +581,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->stack_user_size = 0; data->weight = 0; data->data_src.val = 0; + data->transaction = 0; } extern void perf_output_sample(struct perf_output_handle *handle, diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index ca1d90b..8877965 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -136,8 +136,9 @@ enum perf_event_sample_format { PERF_SAMPLE_WEIGHT = 1U << 14, PERF_SAMPLE_DATA_SRC = 1U << 15, PERF_SAMPLE_IDENTIFIER = 1U << 16, + PERF_SAMPLE_TRANSACTION = 1U << 17, - PERF_SAMPLE_MAX = 1U << 17, /* non-ABI */ + PERF_SAMPLE_MAX = 1U << 18, /* non-ABI */ }; /* @@ -181,6 +182,28 @@ enum perf_sample_regs_abi { }; /* + * Values for the transaction event qualifier, mostly for abort events. + * Multiple bits can be set. + */ +enum { + PERF_SAMPLE_TXN_ELISION = (1 << 0), /* From elision */ + PERF_SAMPLE_TXN_TRANSACTION = (1 << 1), /* From transaction */ + PERF_SAMPLE_TXN_SYNC = (1 << 2), /* Instruction is related */ + PERF_SAMPLE_TXN_ASYNC = (1 << 3), /* Instruction not related */ + PERF_SAMPLE_TXN_RETRY = (1 << 4), /* Retry possible */ + PERF_SAMPLE_TXN_CONFLICT = (1 << 5), /* Conflict abort */ + PERF_SAMPLE_TXN_CAPACITY_WRITE = (1 << 6), /* Capacity write abort */ + PERF_SAMPLE_TXN_CAPACITY_READ = (1 << 7), /* Capacity read abort */ + + PERF_SAMPLE_TXN_MAX = (1 << 8), /* non-ABI */ + + /* bits 24..31 are reserved for the abort code */ + + PERF_SAMPLE_TXN_ABORT_MASK = 0xff000000, + PERF_SAMPLE_TXN_ABORT_SHIFT = 24, +}; + +/* * The format of the data returned by read() on a perf event fd, * as specified by attr.read_format: * diff --git a/kernel/events/core.c b/kernel/events/core.c index dd236b6..e8ab646 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1201,6 +1201,9 @@ static void perf_event__header_size(struct perf_event *event) if (sample_type & PERF_SAMPLE_DATA_SRC) size += sizeof(data->data_src.val); + if (sample_type & PERF_SAMPLE_TRANSACTION) + size += sizeof(data->transaction); + event->header_size = size; } @@ -4551,6 +4554,9 @@ void perf_output_sample(struct perf_output_handle *handle, if (sample_type & PERF_SAMPLE_DATA_SRC) perf_output_put(handle, data->data_src.val); + if (sample_type & PERF_SAMPLE_TRANSACTION) + perf_output_put(handle, data->transaction); + if (!event->attr.watermark) { int wakeup_events = event->attr.wakeup_events; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] perf, core: Add generic transaction flags v4 2013-09-13 18:08 ` [PATCH 1/6] perf, core: Add generic transaction flags v4 Andi Kleen @ 2013-09-16 10:58 ` Peter Zijlstra 2013-09-16 11:04 ` Peter Zijlstra 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2013-09-16 10:58 UTC (permalink / raw) To: Andi Kleen; +Cc: mingo, acme, linux-kernel, eranian, Andi Kleen On Fri, Sep 13, 2013 at 11:08:31AM -0700, Andi Kleen wrote: > /* > + * Values for the transaction event qualifier, mostly for abort events. > + * Multiple bits can be set. > + */ > +enum { > + PERF_SAMPLE_TXN_ELISION = (1 << 0), /* From elision */ > + PERF_SAMPLE_TXN_TRANSACTION = (1 << 1), /* From transaction */ > + PERF_SAMPLE_TXN_SYNC = (1 << 2), /* Instruction is related */ > + PERF_SAMPLE_TXN_ASYNC = (1 << 3), /* Instruction not related */ > + PERF_SAMPLE_TXN_RETRY = (1 << 4), /* Retry possible */ > + PERF_SAMPLE_TXN_CONFLICT = (1 << 5), /* Conflict abort */ > + PERF_SAMPLE_TXN_CAPACITY_WRITE = (1 << 6), /* Capacity write abort */ > + PERF_SAMPLE_TXN_CAPACITY_READ = (1 << 7), /* Capacity read abort */ > + > + PERF_SAMPLE_TXN_MAX = (1 << 8), /* non-ABI */ > + > + /* bits 24..31 are reserved for the abort code */ > + > + PERF_SAMPLE_TXN_ABORT_MASK = 0xff000000, > + PERF_SAMPLE_TXN_ABORT_SHIFT = 24, > +}; Why bits 24..31 ? Why not push the abort code into the upper 32 bits? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] perf, core: Add generic transaction flags v4 2013-09-13 18:08 ` [PATCH 1/6] perf, core: Add generic transaction flags v4 Andi Kleen 2013-09-16 10:58 ` Peter Zijlstra @ 2013-09-16 11:04 ` Peter Zijlstra 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2013-09-16 11:04 UTC (permalink / raw) To: Andi Kleen; +Cc: mingo, acme, linux-kernel, eranian, Andi Kleen On Fri, Sep 13, 2013 at 11:08:31AM -0700, Andi Kleen wrote: > /* > + * Values for the transaction event qualifier, mostly for abort events. > + * Multiple bits can be set. > + */ > +enum { > + PERF_SAMPLE_TXN_ELISION = (1 << 0), /* From elision */ > + PERF_SAMPLE_TXN_TRANSACTION = (1 << 1), /* From transaction */ > + PERF_SAMPLE_TXN_SYNC = (1 << 2), /* Instruction is related */ > + PERF_SAMPLE_TXN_ASYNC = (1 << 3), /* Instruction not related */ > + PERF_SAMPLE_TXN_RETRY = (1 << 4), /* Retry possible */ > + PERF_SAMPLE_TXN_CONFLICT = (1 << 5), /* Conflict abort */ > + PERF_SAMPLE_TXN_CAPACITY_WRITE = (1 << 6), /* Capacity write abort */ > + PERF_SAMPLE_TXN_CAPACITY_READ = (1 << 7), /* Capacity read abort */ > + > + PERF_SAMPLE_TXN_MAX = (1 << 8), /* non-ABI */ > + > + /* bits 24..31 are reserved for the abort code */ > + > + PERF_SAMPLE_TXN_ABORT_MASK = 0xff000000, > + PERF_SAMPLE_TXN_ABORT_SHIFT = 24, > +}; Also do we want to do s/_SAMPLE// on that? Sadly we have both patterns, PERF_SAMPLE_BRANCH_xxx and PERF_MEM_xxx. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/6] perf, x86: Add Haswell specific transaction flag reporting v4 2013-09-13 18:08 perf, x86: Add last TSX PMU code for Haswell Andi Kleen 2013-09-13 18:08 ` [PATCH 1/6] perf, core: Add generic transaction flags v4 Andi Kleen @ 2013-09-13 18:08 ` Andi Kleen 2013-09-16 11:08 ` Peter Zijlstra 2013-09-16 11:21 ` Peter Zijlstra 2013-09-13 18:08 ` [PATCH 3/6] perf, tools: Support sorting by in_tx, abort branch flags v3 Andi Kleen ` (3 subsequent siblings) 5 siblings, 2 replies; 12+ messages in thread From: Andi Kleen @ 2013-09-13 18:08 UTC (permalink / raw) To: mingo; +Cc: peterz, acme, linux-kernel, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> In the PEBS handler report the transaction flags using the new generic transaction flags facility. Most of them come from the "tsx_tuning" field in PEBSv2, but the abort code is derived from the RAX register reported in the PEBS record. v2: Fix interaction with precise-loads v3: Mask out reserved bits. More comments. v4: Adjust white space Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event_intel_ds.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c index 104cbba..f798be8 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c @@ -207,6 +207,8 @@ union hsw_tsx_tuning { u64 value; }; +#define PEBS_HSW_TSX_FLAGS 0xff00000000 + void init_debug_store_on_cpu(int cpu) { struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; @@ -893,6 +895,16 @@ static void __intel_pmu_pebs_event(struct perf_event *event, (x86_pmu.intel_cap.pebs_format >= 2)) data.weight = intel_hsw_weight(pebs); + if ((event->attr.sample_type & PERF_SAMPLE_TRANSACTION) && + x86_pmu.intel_cap.pebs_format >= 2) { + data.transaction = + (pebs->tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32; + /* For RTM XABORTs also log the abort code from AX */ + if ((data.transaction & PERF_SAMPLE_TXN_TRANSACTION) && + (pebs->ax & 1)) + data.transaction |= pebs->ax & 0xff000000; + } + if (has_branch_stack(event)) data.br_stack = &cpuc->lbr_stack; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] perf, x86: Add Haswell specific transaction flag reporting v4 2013-09-13 18:08 ` [PATCH 2/6] perf, x86: Add Haswell specific transaction flag reporting v4 Andi Kleen @ 2013-09-16 11:08 ` Peter Zijlstra 2013-09-16 11:21 ` Peter Zijlstra 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2013-09-16 11:08 UTC (permalink / raw) To: Andi Kleen; +Cc: mingo, acme, linux-kernel, eranian, Andi Kleen On Fri, Sep 13, 2013 at 11:08:32AM -0700, Andi Kleen wrote: > @@ -207,6 +207,8 @@ union hsw_tsx_tuning { > u64 value; > }; > > +#define PEBS_HSW_TSX_FLAGS 0xff00000000 That's a 64bit value and this needs ULL. > void init_debug_store_on_cpu(int cpu) > { > struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; > @@ -893,6 +895,16 @@ static void __intel_pmu_pebs_event(struct perf_event *event, > (x86_pmu.intel_cap.pebs_format >= 2)) > data.weight = intel_hsw_weight(pebs); > > + if ((event->attr.sample_type & PERF_SAMPLE_TRANSACTION) && > + x86_pmu.intel_cap.pebs_format >= 2) { > + data.transaction = > + (pebs->tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32; Screw checkpatch and make that an 81 char line or rename the thing data.txn or so. > + /* For RTM XABORTs also log the abort code from AX */ > + if ((data.transaction & PERF_SAMPLE_TXN_TRANSACTION) && > + (pebs->ax & 1)) > + data.transaction |= pebs->ax & 0xff000000; Yeah, do data.txn, that also allows the above line break to go away. > + } > + > if (has_branch_stack(event)) > data.br_stack = &cpuc->lbr_stack; > > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] perf, x86: Add Haswell specific transaction flag reporting v4 2013-09-13 18:08 ` [PATCH 2/6] perf, x86: Add Haswell specific transaction flag reporting v4 Andi Kleen 2013-09-16 11:08 ` Peter Zijlstra @ 2013-09-16 11:21 ` Peter Zijlstra 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2013-09-16 11:21 UTC (permalink / raw) To: Andi Kleen; +Cc: mingo, acme, linux-kernel, eranian, Andi Kleen On Fri, Sep 13, 2013 at 11:08:32AM -0700, Andi Kleen wrote: > @@ -893,6 +895,16 @@ static void __intel_pmu_pebs_event(struct perf_event *event, > (x86_pmu.intel_cap.pebs_format >= 2)) > data.weight = intel_hsw_weight(pebs); > > + if ((event->attr.sample_type & PERF_SAMPLE_TRANSACTION) && > + x86_pmu.intel_cap.pebs_format >= 2) { > + data.transaction = > + (pebs->tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32; > + /* For RTM XABORTs also log the abort code from AX */ > + if ((data.transaction & PERF_SAMPLE_TXN_TRANSACTION) && > + (pebs->ax & 1)) > + data.transaction |= pebs->ax & 0xff000000; > + } > + > if (has_branch_stack(event)) > data.br_stack = &cpuc->lbr_stack; > Also, since we know now have 2 format >= 2 branches we can combine them; something like so? diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c index f364c13..862e59f 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c @@ -206,6 +206,8 @@ union hsw_tsx_tuning { u64 value; }; +#define PEBS_HSW_TSX_FLAGS 0xff00000000 + void init_debug_store_on_cpu(int cpu) { struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; @@ -880,18 +882,30 @@ static void __intel_pmu_pebs_event(struct perf_event *event, else regs.flags &= ~PERF_EFLAGS_EXACT; - if ((event->attr.sample_type & PERF_SAMPLE_ADDR) && - x86_pmu.intel_cap.pebs_format >= 1) + if (has_branch_stack(event)) + data.br_stack = &cpuc->lbr_stack; + + if (x86_pmu.intel_cap.pebs_format < 1) + goto done; + + if (event->attr.sample_type & PERF_SAMPLE_ADDR) data.addr = pebs->dla; + if (x86_pmu.intel_cap.pebs_format < 2) + goto done; + /* Only set the TSX weight when no memory weight was requested. */ - if ((event->attr.sample_type & PERF_SAMPLE_WEIGHT) && !fll && - (x86_pmu.intel_cap.pebs_format >= 2)) + if ((event->attr.sample_type & PERF_SAMPLE_WEIGHT) && !fll) data.weight = intel_hsw_weight(pebs); - if (has_branch_stack(event)) - data.br_stack = &cpuc->lbr_stack; + if ((event->attr.sample_type & PERF_SAMPLE_TRANSACTION)) { + data.txn = (pebs->tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32; + /* For RTM XABORTs also log the abort code from AX */ + if ((data.txn & PERF_SAMPLE_TXN_TRANSACTION) && (pebs->ax & 1)) + data.txn |= pebs->ax & 0xff000000; + } +done: if (perf_event_overflow(event, &data, ®s)) x86_pmu_stop(event, 0); } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/6] perf, tools: Support sorting by in_tx, abort branch flags v3 2013-09-13 18:08 perf, x86: Add last TSX PMU code for Haswell Andi Kleen 2013-09-13 18:08 ` [PATCH 1/6] perf, core: Add generic transaction flags v4 Andi Kleen 2013-09-13 18:08 ` [PATCH 2/6] perf, x86: Add Haswell specific transaction flag reporting v4 Andi Kleen @ 2013-09-13 18:08 ` Andi Kleen 2013-09-13 18:08 ` [PATCH 4/6] perf, tools: Add abort_tx,no_tx,in_tx branch filter options to perf record -j v3 Andi Kleen ` (2 subsequent siblings) 5 siblings, 0 replies; 12+ messages in thread From: Andi Kleen @ 2013-09-13 18:08 UTC (permalink / raw) To: mingo; +Cc: peterz, acme, linux-kernel, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Extend the perf branch sorting code to support sorting by in_tx or abort_tx qualifiers. Also print out those qualifiers. This also fixes up some of the existing sort key documentation. We do not support no_tx here, because it's simply not showing the in_tx flag. v2: Readd flags to man pages v3: Rename intx Signed-off-by: Andi Kleen <ak@linux.intel.com> --- tools/perf/Documentation/perf-report.txt | 4 ++- tools/perf/Documentation/perf-top.txt | 3 +- tools/perf/builtin-report.c | 2 +- tools/perf/builtin-top.c | 3 +- tools/perf/perf.h | 4 ++- tools/perf/util/hist.h | 2 ++ tools/perf/util/sort.c | 51 ++++++++++++++++++++++++++++++++ tools/perf/util/sort.h | 2 ++ 8 files changed, 66 insertions(+), 5 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 2b8097e..ae337e3 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -71,7 +71,7 @@ OPTIONS entries are displayed as "[other]". - cpu: cpu number the task ran at the time of sample - srcline: filename and line number executed at the time of sample. The - DWARF debuggin info must be provided. + DWARF debugging info must be provided. By default, comm, dso and symbol keys are used. (i.e. --sort comm,dso,symbol) @@ -85,6 +85,8 @@ OPTIONS - symbol_from: name of function branched from - symbol_to: name of function branched to - mispredict: "N" for predicted branch, "Y" for mispredicted branch + - in_tx: branch in TSX transaction + - abort: TSX transaction abort. And default sort keys are changed to comm, dso_from, symbol_from, dso_to and symbol_to, see '--branch-stack'. diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 58d6598..f852eb5 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -112,7 +112,8 @@ Default is to monitor all CPUS. -s:: --sort:: - Sort by key(s): pid, comm, dso, symbol, parent, srcline, weight, local_weight. + Sort by key(s): pid, comm, dso, symbol, parent, srcline, weight, + local_weight, abort, in_tx -n:: --show-nr-samples:: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 8e50d8d..1e84103 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -786,7 +786,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) "sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline," " dso_to, dso_from, symbol_to, symbol_from, mispredict," " weight, local_weight, mem, symbol_daddr, dso_daddr, tlb, " - "snoop, locked"), + "snoop, locked, abort, in_tx"), OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization, "Show sample percentage for different cpu modes"), OPT_STRING('p', "parent", &parent_pattern, "regex", diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 2122141..6534a37 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1103,7 +1103,8 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused) OPT_INCR('v', "verbose", &verbose, "be more verbose (show counter open errors, etc)"), OPT_STRING('s', "sort", &sort_order, "key[,key2...]", - "sort by key(s): pid, comm, dso, symbol, parent, weight, local_weight"), + "sort by key(s): pid, comm, dso, symbol, parent, weight, local_weight," + " abort, in_tx"), OPT_BOOLEAN('n', "show-nr-samples", &symbol_conf.show_nr_samples, "Show a column with the number of samples"), OPT_CALLBACK_DEFAULT('G', "call-graph", &top.record_opts, diff --git a/tools/perf/perf.h b/tools/perf/perf.h index cf20187..acf3d66 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -182,7 +182,9 @@ struct ip_callchain { struct branch_flags { u64 mispred:1; u64 predicted:1; - u64 reserved:62; + u64 in_tx:1; + u64 abort:1; + u64 reserved:60; }; struct branch_entry { diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 1329b6b..f743e96 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -45,6 +45,8 @@ enum hist_column { HISTC_CPU, HISTC_SRCLINE, HISTC_MISPREDICT, + HISTC_IN_TX, + HISTC_ABORT, HISTC_SYMBOL_FROM, HISTC_SYMBOL_TO, HISTC_DSO_FROM, diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 5f118a0..1771566 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -858,6 +858,55 @@ struct sort_entry sort_mem_snoop = { .se_width_idx = HISTC_MEM_SNOOP, }; +static int64_t +sort__abort_cmp(struct hist_entry *left, struct hist_entry *right) +{ + return left->branch_info->flags.abort != + right->branch_info->flags.abort; +} + +static int hist_entry__abort_snprintf(struct hist_entry *self, char *bf, + size_t size, unsigned int width) +{ + static const char *out = "."; + + if (self->branch_info->flags.abort) + out = "A"; + return repsep_snprintf(bf, size, "%-*s", width, out); +} + +struct sort_entry sort_abort = { + .se_header = "Transaction abort", + .se_cmp = sort__abort_cmp, + .se_snprintf = hist_entry__abort_snprintf, + .se_width_idx = HISTC_ABORT, +}; + +static int64_t +sort__in_tx_cmp(struct hist_entry *left, struct hist_entry *right) +{ + return left->branch_info->flags.in_tx != + right->branch_info->flags.in_tx; +} + +static int hist_entry__in_tx_snprintf(struct hist_entry *self, char *bf, + size_t size, unsigned int width) +{ + static const char *out = "."; + + if (self->branch_info->flags.in_tx) + out = "T"; + + return repsep_snprintf(bf, size, "%-*s", width, out); +} + +struct sort_entry sort_in_tx = { + .se_header = "Branch in transaction", + .se_cmp = sort__in_tx_cmp, + .se_snprintf = hist_entry__in_tx_snprintf, + .se_width_idx = HISTC_IN_TX, +}; + struct sort_dimension { const char *name; struct sort_entry *entry; @@ -888,6 +937,8 @@ static struct sort_dimension bstack_sort_dimensions[] = { DIM(SORT_SYM_FROM, "symbol_from", sort_sym_from), DIM(SORT_SYM_TO, "symbol_to", sort_sym_to), DIM(SORT_MISPREDICT, "mispredict", sort_mispredict), + DIM(SORT_IN_TX, "in_tx", sort_in_tx), + DIM(SORT_ABORT, "abort", sort_abort), }; #undef DIM diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 4e80dbd..9dad3a0 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -153,6 +153,8 @@ enum sort_type { SORT_SYM_FROM, SORT_SYM_TO, SORT_MISPREDICT, + SORT_ABORT, + SORT_IN_TX, /* memory mode specific sort keys */ __SORT_MEMORY_MODE, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/6] perf, tools: Add abort_tx,no_tx,in_tx branch filter options to perf record -j v3 2013-09-13 18:08 perf, x86: Add last TSX PMU code for Haswell Andi Kleen ` (2 preceding siblings ...) 2013-09-13 18:08 ` [PATCH 3/6] perf, tools: Support sorting by in_tx, abort branch flags v3 Andi Kleen @ 2013-09-13 18:08 ` Andi Kleen 2013-09-13 18:08 ` [PATCH 5/6] perf, tools: Add support for record transaction flags v4 Andi Kleen 2013-09-13 18:08 ` [PATCH 6/6] perf, x86: Suppress duplicated abort LBR records Andi Kleen 5 siblings, 0 replies; 12+ messages in thread From: Andi Kleen @ 2013-09-13 18:08 UTC (permalink / raw) To: mingo; +Cc: peterz, acme, linux-kernel, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Make perf record -j aware of the new in_tx,no_tx,abort_tx branch qualifiers. v2: ABORT -> ABORTTX v3: Add more _ Signed-off-by: Andi Kleen <ak@linux.intel.com> --- tools/perf/Documentation/perf-record.txt | 3 +++ tools/perf/builtin-record.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index e297b74..6bec1c9 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -166,6 +166,9 @@ following filters are defined: - u: only when the branch target is at the user level - k: only when the branch target is in the kernel - hv: only when the target is at the hypervisor level + - in_tx: only when the target is in a hardware transaction + - no_tx: only when the target is not in a hardware transaction + - abort_tx: only when the target is a hardware transaction abort + The option requires at least one branch type among any, any_call, any_ret, ind_call. diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index a41ac415..8384b54 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -618,6 +618,9 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL), BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN), BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL), + BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX), + BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX), + BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX), BRANCH_END }; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/6] perf, tools: Add support for record transaction flags v4 2013-09-13 18:08 perf, x86: Add last TSX PMU code for Haswell Andi Kleen ` (3 preceding siblings ...) 2013-09-13 18:08 ` [PATCH 4/6] perf, tools: Add abort_tx,no_tx,in_tx branch filter options to perf record -j v3 Andi Kleen @ 2013-09-13 18:08 ` Andi Kleen 2013-09-13 18:08 ` [PATCH 6/6] perf, x86: Suppress duplicated abort LBR records Andi Kleen 5 siblings, 0 replies; 12+ messages in thread From: Andi Kleen @ 2013-09-13 18:08 UTC (permalink / raw) To: mingo; +Cc: peterz, acme, linux-kernel, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Add support for recording and displaying the transaction flags. They are essentially a new sort key. Also display them in a nice way to the user. v2: Fix manpage v3: Move transaction to the end v4: Handle capacity-read/write Signed-off-by: Andi Kleen <ak@linux.intel.com> --- tools/perf/Documentation/perf-record.txt | 4 +- tools/perf/Documentation/perf-report.txt | 4 ++ tools/perf/Documentation/perf-top.txt | 2 +- tools/perf/builtin-annotate.c | 2 +- tools/perf/builtin-diff.c | 8 ++-- tools/perf/builtin-record.c | 2 + tools/perf/builtin-report.c | 4 +- tools/perf/builtin-top.c | 5 +-- tools/perf/perf.h | 1 + tools/perf/tests/hists_link.c | 6 ++- tools/perf/util/event.h | 1 + tools/perf/util/evsel.c | 9 ++++ tools/perf/util/hist.c | 7 ++- tools/perf/util/hist.h | 4 +- tools/perf/util/session.c | 3 ++ tools/perf/util/sort.c | 73 ++++++++++++++++++++++++++++++++ tools/perf/util/sort.h | 2 + 17 files changed, 122 insertions(+), 15 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 6bec1c9..f732eaa 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -179,12 +179,14 @@ is enabled for all the sampling events. The sampled branch type is the same for The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k Note that this feature may not be available on all processors. --W:: --weight:: Enable weightened sampling. An additional weight is recorded per sample and can be displayed with the weight and local_weight sort keys. This currently works for TSX abort events and some memory events in precise mode on modern Intel CPUs. +--transaction:: +Record transaction flags for transaction related events. + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1] diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index ae337e3..be5ad87 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -72,6 +72,10 @@ OPTIONS - cpu: cpu number the task ran at the time of sample - srcline: filename and line number executed at the time of sample. The DWARF debugging info must be provided. + - weight: Event specific weight, e.g. memory latency or transaction + abort cost. This is the global weight. + - local_weight: Local weight version of the weight above. + - transaction: Transaction abort flags. By default, comm, dso and symbol keys are used. (i.e. --sort comm,dso,symbol) diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index f852eb5..6d70fbf 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -113,7 +113,7 @@ Default is to monitor all CPUS. -s:: --sort:: Sort by key(s): pid, comm, dso, symbol, parent, srcline, weight, - local_weight, abort, in_tx + local_weight, abort, in_tx, transaction -n:: --show-nr-samples:: diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 5ebd0c3..0393d98 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -63,7 +63,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel, return 0; } - he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1); + he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1, 0); if (he == NULL) return -ENOMEM; diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index f28799e..2a78dc8 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -304,9 +304,10 @@ static int formula_fprintf(struct hist_entry *he, struct hist_entry *pair, static int hists__add_entry(struct hists *self, struct addr_location *al, u64 period, - u64 weight) + u64 weight, u64 transaction) { - if (__hists__add_entry(self, al, NULL, period, weight) != NULL) + if (__hists__add_entry(self, al, NULL, period, weight, transaction) + != NULL) return 0; return -ENOMEM; } @@ -328,7 +329,8 @@ static int diff__process_sample_event(struct perf_tool *tool __maybe_unused, if (al.filtered) return 0; - if (hists__add_entry(&evsel->hists, &al, sample->period, sample->weight)) { + if (hists__add_entry(&evsel->hists, &al, sample->period, + sample->weight, sample->transaction)) { pr_warning("problem incrementing symbol period, skipping event\n"); return -1; } diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 8384b54..a78db3f 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -894,6 +894,8 @@ const struct option record_options[] = { parse_branch_stack), OPT_BOOLEAN('W', "weight", &record.opts.sample_weight, "sample by weight (on special events only)"), + OPT_BOOLEAN(0, "transaction", &record.opts.sample_transaction, + "sample transaction flags (special events only)"), OPT_END() }; diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 1e84103..8657a3d 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -259,7 +259,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel, } he = __hists__add_entry(&evsel->hists, al, parent, sample->period, - sample->weight); + sample->weight, sample->transaction); if (he == NULL) return -ENOMEM; @@ -786,7 +786,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) "sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline," " dso_to, dso_from, symbol_to, symbol_from, mispredict," " weight, local_weight, mem, symbol_daddr, dso_daddr, tlb, " - "snoop, locked, abort, in_tx"), + "snoop, locked, abort, in_tx, transaction"), OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization, "Show sample percentage for different cpu modes"), OPT_STRING('p', "parent", &parent_pattern, "regex", diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 6534a37..b3e0229 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -247,9 +247,8 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel, pthread_mutex_lock(&evsel->hists.lock); he = __hists__add_entry(&evsel->hists, al, NULL, sample->period, - sample->weight); + sample->weight, sample->transaction); pthread_mutex_unlock(&evsel->hists.lock); - if (he == NULL) return NULL; @@ -1104,7 +1103,7 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused) "be more verbose (show counter open errors, etc)"), OPT_STRING('s', "sort", &sort_order, "key[,key2...]", "sort by key(s): pid, comm, dso, symbol, parent, weight, local_weight," - " abort, in_tx"), + " abort, in_tx, transaction"), OPT_BOOLEAN('n', "show-nr-samples", &symbol_conf.show_nr_samples, "Show a column with the number of samples"), OPT_CALLBACK_DEFAULT('G', "call-graph", &top.record_opts, diff --git a/tools/perf/perf.h b/tools/perf/perf.h index acf3d66..84502e8 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -233,6 +233,7 @@ struct perf_record_opts { u64 default_interval; u64 user_interval; u16 stack_dump_size; + bool sample_transaction; }; #endif diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c index 4228ffc..025503a 100644 --- a/tools/perf/tests/hists_link.c +++ b/tools/perf/tests/hists_link.c @@ -222,7 +222,8 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine) &sample) < 0) goto out; - he = __hists__add_entry(&evsel->hists, &al, NULL, 1, 1); + he = __hists__add_entry(&evsel->hists, &al, NULL, + 1, 1, 0); if (he == NULL) goto out; @@ -244,7 +245,8 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine) &sample) < 0) goto out; - he = __hists__add_entry(&evsel->hists, &al, NULL, 1, 1); + he = __hists__add_entry(&evsel->hists, &al, NULL, 1, 1, + 0); if (he == NULL) goto out; diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index c67ecc4..17d9e16 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -111,6 +111,7 @@ struct perf_sample { u64 stream_id; u64 period; u64 weight; + u64 transaction; u32 cpu; u32 raw_size; u64 data_src; diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 0ce9feb..abe69af 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -681,6 +681,9 @@ void perf_evsel__config(struct perf_evsel *evsel, attr->mmap2 = track && !perf_missing_features.mmap2; attr->comm = track; + if (opts->sample_transaction) + attr->sample_type |= PERF_SAMPLE_TRANSACTION; + /* * XXX see the function comment above * @@ -1470,6 +1473,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event, array++; } + data->transaction = 0; + if (type & PERF_SAMPLE_TRANSACTION) { + data->transaction = *array; + array++; + } + return 0; } diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 46a0d35..4714a72 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -160,6 +160,10 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h) hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3); hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12); hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12); + + if (h->transaction) + hists__new_col_len(hists, HISTC_TRANSACTION, + hist_entry__transaction_len()); } void hists__output_recalc_col_len(struct hists *hists, int max_rows) @@ -466,7 +470,7 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self, struct hist_entry *__hists__add_entry(struct hists *self, struct addr_location *al, struct symbol *sym_parent, u64 period, - u64 weight) + u64 weight, u64 transaction) { struct hist_entry entry = { .thread = al->thread, @@ -487,6 +491,7 @@ struct hist_entry *__hists__add_entry(struct hists *self, .hists = self, .branch_info = NULL, .mem_info = NULL, + .transaction = transaction, }; return add_hist_entry(self, &entry, al, period, weight); diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index f743e96..6a048c0 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -59,6 +59,7 @@ enum hist_column { HISTC_MEM_TLB, HISTC_MEM_LVL, HISTC_MEM_SNOOP, + HISTC_TRANSACTION, HISTC_NR_COLS, /* Last entry */ }; @@ -84,9 +85,10 @@ struct hists { struct hist_entry *__hists__add_entry(struct hists *self, struct addr_location *al, struct symbol *parent, u64 period, - u64 weight); + u64 weight, u64 transaction); int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right); int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right); +int hist_entry__transaction_len(void); int hist_entry__sort_snprintf(struct hist_entry *self, char *bf, size_t size, struct hists *hists); void hist_entry__free(struct hist_entry *); diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 51f5edf..ef5af04 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -855,6 +855,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event, if (sample_type & PERF_SAMPLE_DATA_SRC) printf(" . data_src: 0x%"PRIx64"\n", sample->data_src); + if (sample_type & PERF_SAMPLE_TRANSACTION) + printf("... transaction: %" PRIx64 "\n", sample->transaction); + if (sample_type & PERF_SAMPLE_READ) sample_read__printf(sample, evsel->attr.read_format); } diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 1771566..729dea3 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -907,6 +907,78 @@ struct sort_entry sort_in_tx = { .se_width_idx = HISTC_IN_TX, }; +static int64_t +sort__transaction_cmp(struct hist_entry *left, struct hist_entry *right) +{ + return left->transaction - right->transaction; +} + +static inline char *add_str(char *p, const char *str) +{ + strcpy(p, str); + return p + strlen(str); +} + +static struct txbit { + unsigned flag; + const char *name; + int skip_for_len; +} txbits[] = { + { PERF_SAMPLE_TXN_ELISION, "EL ", 0 }, + { PERF_SAMPLE_TXN_TRANSACTION, "TX ", 1 }, + { PERF_SAMPLE_TXN_SYNC, "SYNC ", 1 }, + { PERF_SAMPLE_TXN_ASYNC, "ASYNC ", 0 }, + { PERF_SAMPLE_TXN_RETRY, "RETRY ", 0 }, + { PERF_SAMPLE_TXN_CONFLICT, "CON ", 0 }, + { PERF_SAMPLE_TXN_CAPACITY_WRITE, "CAP-WRITE ", 1 }, + { PERF_SAMPLE_TXN_CAPACITY_READ, "CAP-READ ", 0 }, + { 0, NULL, 0 } +}; + +int hist_entry__transaction_len(void) +{ + int i; + int len = 0; + + for (i = 0; txbits[i].name; i++) { + if (!txbits[i].skip_for_len) + len += strlen(txbits[i].name); + } + len += 4; /* :XX<space> */ + return len; +} + +static int hist_entry__transaction_snprintf(struct hist_entry *self, char *bf, + size_t size, unsigned int width) +{ + u64 t = self->transaction; + char buf[128]; + char *p = buf; + int i; + + buf[0] = 0; + for (i = 0; txbits[i].name; i++) + if (txbits[i].flag & t) + p = add_str(p, txbits[i].name); + if (t && !(t & (PERF_SAMPLE_TXN_SYNC|PERF_SAMPLE_TXN_ASYNC))) + p = add_str(p, "NEITHER "); + if (t & PERF_SAMPLE_TXN_ABORT_MASK) { + sprintf(p, ":%" PRIx64, + (t & PERF_SAMPLE_TXN_ABORT_MASK) >> + PERF_SAMPLE_TXN_ABORT_SHIFT); + p += strlen(p); + } + + return repsep_snprintf(bf, size, "%-*s", width, buf); +} + +struct sort_entry sort_transaction = { + .se_header = "Transaction ", + .se_cmp = sort__transaction_cmp, + .se_snprintf = hist_entry__transaction_snprintf, + .se_width_idx = HISTC_TRANSACTION, +}; + struct sort_dimension { const char *name; struct sort_entry *entry; @@ -925,6 +997,7 @@ static struct sort_dimension common_sort_dimensions[] = { DIM(SORT_SRCLINE, "srcline", sort_srcline), DIM(SORT_LOCAL_WEIGHT, "local_weight", sort_local_weight), DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight), + DIM(SORT_TRANSACTION, "transaction", sort_transaction), }; #undef DIM diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 9dad3a0..bf43336 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -85,6 +85,7 @@ struct hist_entry { struct map_symbol ms; struct thread *thread; u64 ip; + u64 transaction; s32 cpu; struct hist_entry_diff diff; @@ -145,6 +146,7 @@ enum sort_type { SORT_SRCLINE, SORT_LOCAL_WEIGHT, SORT_GLOBAL_WEIGHT, + SORT_TRANSACTION, /* branch stack specific sort keys */ __SORT_BRANCH_STACK, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/6] perf, x86: Suppress duplicated abort LBR records 2013-09-13 18:08 perf, x86: Add last TSX PMU code for Haswell Andi Kleen ` (4 preceding siblings ...) 2013-09-13 18:08 ` [PATCH 5/6] perf, tools: Add support for record transaction flags v4 Andi Kleen @ 2013-09-13 18:08 ` Andi Kleen 2013-09-16 11:28 ` Peter Zijlstra 5 siblings, 1 reply; 12+ messages in thread From: Andi Kleen @ 2013-09-13 18:08 UTC (permalink / raw) To: mingo; +Cc: peterz, acme, linux-kernel, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Haswell always give an extra LBR record after every TSX abort. Suppress the extra record. This only works when the abort is visible in the LBR If the original abort has already left the 16 LBR entries the extra entry will will stay. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event.h | 1 + arch/x86/kernel/cpu/perf_event_intel.c | 1 + arch/x86/kernel/cpu/perf_event_intel_lbr.c | 29 +++++++++++++++++++++-------- 3 files changed, 23 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h index cc16faa..3b303c6 100644 --- a/arch/x86/kernel/cpu/perf_event.h +++ b/arch/x86/kernel/cpu/perf_event.h @@ -440,6 +440,7 @@ struct x86_pmu { int lbr_nr; /* hardware stack size */ u64 lbr_sel_mask; /* LBR_SELECT valid bits */ const int *lbr_sel_map; /* lbr_select mappings */ + bool lbr_double_abort; /* duplicated lbr aborts */ /* * Extra registers for events diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 7c53676..9262551 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2515,6 +2515,7 @@ __init int intel_pmu_init(void) x86_pmu.hw_config = hsw_hw_config; x86_pmu.get_event_constraints = hsw_get_event_constraints; x86_pmu.cpu_events = hsw_events_attrs; + x86_pmu.lbr_double_abort = true; pr_cont("Haswell events, "); break; diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index d5be06a..9ad9aaf 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -284,6 +284,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) int lbr_format = x86_pmu.intel_cap.lbr_format; u64 tos = intel_pmu_lbr_tos(); int i; + int out = 0; for (i = 0; i < x86_pmu.lbr_nr; i++) { unsigned long lbr_idx = (tos - i) & mask; @@ -306,15 +307,27 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) } from = (u64)((((s64)from) << skip) >> skip); - cpuc->lbr_entries[i].from = from; - cpuc->lbr_entries[i].to = to; - cpuc->lbr_entries[i].mispred = mis; - cpuc->lbr_entries[i].predicted = pred; - cpuc->lbr_entries[i].in_tx = in_tx; - cpuc->lbr_entries[i].abort = abort; - cpuc->lbr_entries[i].reserved = 0; + /* + * Some CPUs report duplicated abort records, + * with the second entry not having an abort bit set. + * Skip them here. This loop runs backwards, + * so we need to undo the previous record. + * If the abort just happened outside the window + * the extra entry cannot be removed. + */ + if (abort && x86_pmu.lbr_double_abort && out > 0) + out--; + + cpuc->lbr_entries[out].from = from; + cpuc->lbr_entries[out].to = to; + cpuc->lbr_entries[out].mispred = mis; + cpuc->lbr_entries[out].predicted = pred; + cpuc->lbr_entries[out].in_tx = in_tx; + cpuc->lbr_entries[out].abort = abort; + cpuc->lbr_entries[out].reserved = 0; + out++; } - cpuc->lbr_stack.nr = i; + cpuc->lbr_stack.nr = out; } void intel_pmu_lbr_read(void) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 6/6] perf, x86: Suppress duplicated abort LBR records 2013-09-13 18:08 ` [PATCH 6/6] perf, x86: Suppress duplicated abort LBR records Andi Kleen @ 2013-09-16 11:28 ` Peter Zijlstra 0 siblings, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2013-09-16 11:28 UTC (permalink / raw) To: Andi Kleen; +Cc: mingo, acme, linux-kernel, eranian, Andi Kleen On Fri, Sep 13, 2013 at 11:08:36AM -0700, Andi Kleen wrote: > - cpuc->lbr_entries[i].from = from; > - cpuc->lbr_entries[i].to = to; > - cpuc->lbr_entries[i].mispred = mis; > - cpuc->lbr_entries[i].predicted = pred; > - cpuc->lbr_entries[i].in_tx = in_tx; > - cpuc->lbr_entries[i].abort = abort; > - cpuc->lbr_entries[i].reserved = 0; > + cpuc->lbr_entries[out].from = from; > + cpuc->lbr_entries[out].to = to; > + cpuc->lbr_entries[out].mispred = mis; > + cpuc->lbr_entries[out].predicted = pred; > + cpuc->lbr_entries[out].in_tx = in_tx; > + cpuc->lbr_entries[out].abort = abort; > + cpuc->lbr_entries[out].reserved = 0; just add an extra space before the '=' so they're all nicely aligned again. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-09-16 11:28 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-13 18:08 perf, x86: Add last TSX PMU code for Haswell Andi Kleen 2013-09-13 18:08 ` [PATCH 1/6] perf, core: Add generic transaction flags v4 Andi Kleen 2013-09-16 10:58 ` Peter Zijlstra 2013-09-16 11:04 ` Peter Zijlstra 2013-09-13 18:08 ` [PATCH 2/6] perf, x86: Add Haswell specific transaction flag reporting v4 Andi Kleen 2013-09-16 11:08 ` Peter Zijlstra 2013-09-16 11:21 ` Peter Zijlstra 2013-09-13 18:08 ` [PATCH 3/6] perf, tools: Support sorting by in_tx, abort branch flags v3 Andi Kleen 2013-09-13 18:08 ` [PATCH 4/6] perf, tools: Add abort_tx,no_tx,in_tx branch filter options to perf record -j v3 Andi Kleen 2013-09-13 18:08 ` [PATCH 5/6] perf, tools: Add support for record transaction flags v4 Andi Kleen 2013-09-13 18:08 ` [PATCH 6/6] perf, x86: Suppress duplicated abort LBR records Andi Kleen 2013-09-16 11:28 ` Peter Zijlstra
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.