From: Jiri Olsa <jolsa@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>,
Don Zickus <dzickus@redhat.com>, Joe Mario <jmario@redhat.com>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Namhyung Kim <namhyung@kernel.org>,
David Ahern <dsahern@gmail.com>, Andi Kleen <andi@firstfloor.org>
Subject: [PATCH 60/61] perf c2c: Add man page and credits
Date: Mon, 19 Sep 2016 15:10:09 +0200 [thread overview]
Message-ID: <1474290610-23241-61-git-send-email-jolsa@kernel.org> (raw)
In-Reply-To: <1474290610-23241-1-git-send-email-jolsa@kernel.org>
Adding man page for c2c command and credits
to builtin-c2c.c file.
Link: http://lkml.kernel.org/n/tip-twbp391v8v9f5idp584hlfov@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
tools/perf/Documentation/perf-c2c.txt | 276 ++++++++++++++++++++++++++++++++++
tools/perf/builtin-c2c.c | 11 ++
2 files changed, 287 insertions(+)
create mode 100644 tools/perf/Documentation/perf-c2c.txt
diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
new file mode 100644
index 000000000000..ba2f4de399c3
--- /dev/null
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -0,0 +1,276 @@
+perf-c2c(1)
+===========
+
+NAME
+----
+perf-c2c - Shared Data C2C/HITM Analyzer.
+
+SYNOPSIS
+--------
+[verse]
+'perf c2c record' [<options>] <command>
+'perf c2c record' [<options>] -- [<record command options>] <command>
+'perf c2c report' [<options>]
+
+DESCRIPTION
+-----------
+C2C stands for Cache To Cache.
+
+The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
+you to track down the cacheline contentions.
+
+The tool is based on x86's load latency and precise store facility events
+provided by Intel CPUs. These events provide:
+ - memory address of the access
+ - type of the access (load and store details)
+ - latency (in cycles) of the load access
+
+The c2c tool provide means to record this data and report back access details
+for cachelines with highest contention - highest number of HITM accesses.
+
+The basic workflow with this tool follows the standard record/report phase.
+User uses the record command to record events data and report command to
+display it.
+
+
+RECORD OPTIONS
+--------------
+-e::
+--event=::
+ Select the PMU event. Use 'perf mem record -e list'
+ to list available events.
+
+-v::
+--verbose::
+ Be more verbose (show counter open errors, etc).
+
+-l::
+--ldlat::
+ Configure mem-loads latency.
+
+-k::
+--all-kernel::
+ Configure all used events to run in kernel space.
+
+-u::
+--all-user::
+ Configure all used events to run in user space.
+
+REPORT OPTIONS
+--------------
+-k::
+--vmlinux=<file>::
+ vmlinux pathname
+
+-v::
+--verbose::
+ Be more verbose (show counter open errors, etc).
+
+-i::
+--input::
+ Specify the input file to process.
+
+-N::
+--node-info::
+ Show extra node info in report (see NODE INFO section)
+
+-c::
+--coalesce::
+ Specify sorintg fields for single cacheline display.
+ Following fields are available: tid,pid,iaddr,dso
+ (see COALESCE)
+
+-g::
+--call-graph::
+ Setup callchains parameters.
+ Please refer to perf-report man page for details.
+
+--stdio::
+ Force the stdio output (see STDIO OUTPUT)
+
+--stats::
+ Display only statistic tables and force stdio mode.
+
+--full-symbols::
+ Display full length of symbols.
+
+C2C RECORD
+----------
+The perf c2c record command setup options related to HITM cacheline analysis
+and calls standard perf record command.
+
+Following perf record options are configured by default:
+(check perf record man page for details)
+
+ -W,-d,--sample-cpu
+
+Unless specified otherwise with '-e' option, following events are monitored by
+default:
+
+ cpu/mem-loads,ldlat=30/P
+ cpu/mem-stores/P
+
+User can pass any 'perf record' option behind '--' mark, like (to enable
+callchains and system wide monitoring):
+
+ $ perf c2c record -- -g -a
+
+Please check RECORD OPTIONS section for specific c2c record options.
+
+C2C REPORT
+----------
+The perf c2c report command displays shared data analysis. It comes in two
+display modes: stdio and tui (default).
+
+The report command workflow is following:
+ - sort all the data based on the cacheline address
+ - store access details for each cacheline
+ - sort all cachelines based on user settings
+ - display data
+
+In general perf report output consist of 2 basic views:
+ 1) most expensive cachelines list
+ 2) offsets details for each cacheline
+
+For each cacheline in the 1) list we display following data:
+(Both stdio and TUI modes follow the same fields output)
+
+ Index
+ - zero based index to identify the cacheline
+
+ Cacheline
+ - cacheline address (hex number)
+
+ Total records
+ - sum of all cachelines accesses
+
+ Rmt/Lcl Hitm
+ - cacheline percentage of all Remote/Local HITM accesses
+
+ LLC Load Hitm - Total, Lcl, Rmt
+ - count of Total/Local/Remote load HITMs
+
+ Store Reference - Total, L1Hit, L1Miss
+ Total - all store accesses
+ L1Hit - store accesses that hit L1
+ L1Hit - store accesses that missed L1
+
+ Load Dram
+ - count of local and remote DRAM accesses
+
+ LLC Ld Miss
+ - count of all accesses that missed LLC
+
+ Total Loads
+ - sum of all load accesses
+
+ Core Load Hit - FB, L1, L2
+ - count of load hits in FB (Fill Buffer), L1 and L2 cache
+
+ LLC Load Hit - Llc, Rmt
+ - count of LLC and Remote load hits
+
+For each offset in the 2) list we display following data:
+
+ HITM - Rmt, Lcl
+ - % of Remote/Local HITM accesses for given offset within cacheline
+
+ Store Refs - L1 Hit, L1 Miss
+ - % of store accesses that hit/missed L1 for given offset within cacheline
+
+ Data address - Offset
+ - offset address
+
+ Pid
+ - pid of the process responsible for the accesses
+
+ Tid
+ - tid of the process responsible for the accesses
+
+ Code address
+ - code address responsible for the accesses
+
+ cycles - rmt hitm, lcl hitm, load
+ - sum of cycles for given accesses - Remote/Local HITM and generic load
+
+ cpu cnt
+ - number of cpus that participated on the access
+
+ Symbol
+ - code symbol related to the 'Code address' value
+
+ Shared Object
+ - shared object name related to the 'Code address' value
+
+ Source:Line
+ - source information related to the 'Code address' value
+
+ Node
+ - nodes participating on the access (see NODE INFO section)
+
+NODE INFO
+---------
+The 'Node' field displays nodes that accesses given cacheline
+offset. Its output comes in 3 flavors:
+ - node IDs separated by ','
+ - node IDs with stats for each ID, in following format:
+ Node{cpus %hitms %stores}
+ - node IDs with list of affected CPUs in following format:
+ Node{cpu list}
+
+User can switch between above flavors with -N option or
+use 'n' key to interactively switch in TUI mode.
+
+COALESCE
+--------
+User can specify how to sort offsets for cacheline.
+
+Following fields are available and governs the final
+output fields set for caheline offsets output:
+
+ tid - coalesced by process TIDs
+ pid - coalesced by process PIDs
+ iaddr - coalesced by code address, following fields are displayed:
+ Code address, Code symbol, Shared Object, Source line
+ dso - coalesced by shared object
+
+By default the coalescing is setup with 'pid,tid,iaddr'.
+
+STDIO OUTPUT
+------------
+The stdio output displays data on standard output.
+
+Following tables are displayed:
+ Trace Event Information
+ - overall statistics of memory accesses
+
+ Global Shared Cache Line Event Information
+ - overall statistics on shared cachelines
+
+ Shared Data Cache Line Table
+ - list of most expensive cachelines
+
+ Shared Cache Line Distribution Pareto
+ - list of all accessed offsets for each cacheline
+
+TUI OUTPUT
+----------
+The TUI output provides interactive interface to navigate
+through cachelines list and to display offset details.
+
+For details please refer to the help window by pressing '?' key.
+
+CREDITS
+-------
+Although Don Zickus, Dick Fowles and Joe Mario worked together
+to get this implemented, we got lots of early help from Arnaldo
+Carvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen.
+
+C2C BLOG
+--------
+Check Joe's blog on c2c tool for detailed use case explanation:
+ https://joemario.github.io/blog/2016/09/01/c2c-blog/
+
+SEE ALSO
+--------
+linkperf:perf-record[1], linkperf:perf-mem[1]
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index e1e74ed27075..61d6abb3713d 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1,3 +1,14 @@
+/*
+ * This is rewrite of original c2c tool introduced in here:
+ * http://lwn.net/Articles/588866/
+ *
+ * The original tool was changed to fit in current perf state.
+ *
+ * Original authors:
+ * Don Zickus <dzickus@redhat.com>
+ * Dick Fowles <fowles@inreach.com>
+ * Joe Mario <jmario@redhat.com>
+ */
#include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/stringify.h>
--
2.7.4
next prev parent reply other threads:[~2016-09-19 13:15 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-19 13:09 [PATCHv3 00/61] perf c2c: Add new tool to analyze cacheline contention on NUMA systems Jiri Olsa
2016-09-19 13:09 ` [PATCH 01/61] perf symbols: Do not open device files again Jiri Olsa
2016-09-20 15:28 ` Arnaldo Carvalho de Melo
2016-09-20 15:36 ` Jiri Olsa
2016-09-20 16:12 ` [PATCHv2 01/61] perf symbols: Do not open device files Jiri Olsa
2016-09-20 21:45 ` [tip:perf/core] " tip-bot for Jiri Olsa
2016-09-19 13:09 ` [PATCH 02/61] perf tools: Remove superfluous initialization of weight Jiri Olsa
2016-09-21 15:15 ` Arnaldo Carvalho de Melo
2016-09-23 5:24 ` [tip:perf/core] perf evsel: " tip-bot for Jiri Olsa
2016-09-19 13:09 ` [PATCH 03/61] perf tools: Make hist_entry__snprintf work over struct perf_hpp_list Jiri Olsa
2016-09-21 15:14 ` Arnaldo Carvalho de Melo
2016-09-21 15:30 ` Jiri Olsa
2016-09-19 13:09 ` [PATCH 04/61] perf tools: Use bigger buffer for stdio headers Jiri Olsa
2016-09-21 15:15 ` Arnaldo Carvalho de Melo
2016-09-23 5:25 ` [tip:perf/core] perf hists: " tip-bot for Jiri Olsa
2016-09-19 13:09 ` [PATCH 05/61] perf tools: Introduce c2c_decode_stats function Jiri Olsa
2016-09-19 17:15 ` Nilay Vaish
2016-09-19 18:04 ` Joe Mario
[not found] ` <CACDz1GupJi3kcDx6zBK68KtpL=Q9hJvUFvHCdtMirMyuuuyMOQ@mail.gmail.com>
2016-09-21 9:18 ` Jiri Olsa
2016-09-21 15:16 ` Don Zickus
2016-09-21 15:32 ` Jiri Olsa
2016-09-19 13:09 ` [PATCH 06/61] perf tools: Introduce c2c_add_stats function Jiri Olsa
2016-09-19 13:09 ` [PATCH 07/61] perf tools: Make reset_dimensions global Jiri Olsa
2016-09-19 13:09 ` [PATCH 08/61] perf tools: Make output_field_add and sort_dimension__add global Jiri Olsa
2016-09-19 13:09 ` [PATCH 09/61] perf tools: Make several sorting functions global Jiri Olsa
2016-09-19 13:09 ` [PATCH 10/61] perf tools: Make several display " Jiri Olsa
2016-09-19 13:09 ` [PATCH 11/61] perf tools: Make hist_entry__snprintf function global Jiri Olsa
2016-09-19 13:09 ` [PATCH 12/61] perf tools: Make hists__fprintf_headers " Jiri Olsa
2016-09-19 13:09 ` [PATCH 13/61] perf c2c: Add c2c command Jiri Olsa
2016-09-19 13:09 ` [PATCH 14/61] perf c2c: Add record subcommand Jiri Olsa
2016-09-19 13:09 ` [PATCH 15/61] perf c2c: Add report subcommand Jiri Olsa
2016-09-19 13:09 ` [PATCH 16/61] perf c2c report: Add dimension support Jiri Olsa
2016-09-19 13:09 ` [PATCH 17/61] perf c2c report: Add sort_entry " Jiri Olsa
2016-09-19 13:09 ` [PATCH 18/61] perf c2c report: Fallback to standard dimensions Jiri Olsa
2016-09-19 13:09 ` [PATCH 19/61] perf c2c report: Add sample processing Jiri Olsa
2016-09-19 13:09 ` [PATCH 20/61] perf c2c report: Add cacheline hists processing Jiri Olsa
2016-09-19 13:09 ` [PATCH 21/61] perf c2c report: Decode c2c_stats for hist entries Jiri Olsa
2016-09-19 13:09 ` [PATCH 22/61] perf c2c report: Add header macros Jiri Olsa
2016-09-19 13:09 ` [PATCH 23/61] perf c2c report: Add dcacheline dimension key Jiri Olsa
2016-09-19 13:09 ` [PATCH 24/61] perf c2c report: Add offset " Jiri Olsa
2016-09-19 13:09 ` [PATCH 25/61] perf c2c report: Add iaddr " Jiri Olsa
2016-09-19 13:09 ` [PATCH 26/61] perf c2c report: Add hitm related dimension keys Jiri Olsa
2016-09-19 13:09 ` [PATCH 27/61] perf c2c report: Add stores " Jiri Olsa
2016-09-19 13:09 ` [PATCH 28/61] perf c2c report: Add loads " Jiri Olsa
2016-09-19 13:09 ` [PATCH 29/61] perf c2c report: Add llc and remote " Jiri Olsa
2016-09-19 13:09 ` [PATCH 30/61] perf c2c report: Add llc load miss dimension key Jiri Olsa
2016-09-19 13:09 ` [PATCH 31/61] perf c2c report: Add total record sort key Jiri Olsa
2016-09-19 13:09 ` [PATCH 32/61] perf c2c report: Add total loads " Jiri Olsa
2016-09-19 13:09 ` [PATCH 33/61] perf c2c report: Add hitm percent " Jiri Olsa
2016-09-19 13:09 ` [PATCH 34/61] perf c2c report: Add hitm/store percent related sort keys Jiri Olsa
2016-09-19 13:09 ` [PATCH 35/61] perf c2c report: Add dram " Jiri Olsa
2016-09-19 13:09 ` [PATCH 36/61] perf c2c report: Add pid sort key Jiri Olsa
2016-09-19 13:09 ` [PATCH 37/61] perf c2c report: Add tid " Jiri Olsa
2016-09-19 13:09 ` [PATCH 38/61] perf c2c report: Add symbol and dso sort keys Jiri Olsa
2016-09-19 13:09 ` [PATCH 39/61] perf c2c report: Add node sort key Jiri Olsa
2016-09-19 13:09 ` [PATCH 40/61] perf c2c report: Add stats related sort keys Jiri Olsa
2016-09-19 13:09 ` [PATCH 41/61] perf c2c report: Add cpu cnt sort key Jiri Olsa
2016-09-19 13:09 ` [PATCH 42/61] perf c2c report: Add src line " Jiri Olsa
2016-09-19 13:09 ` [PATCH 43/61] perf c2c report: Setup number of header lines for hists Jiri Olsa
2016-09-19 13:09 ` [PATCH 44/61] perf c2c report: Set final resort fields Jiri Olsa
2016-09-19 13:09 ` [PATCH 45/61] perf c2c report: Add stdio output support Jiri Olsa
2016-09-19 13:09 ` [PATCH 46/61] perf c2c report: Add main browser Jiri Olsa
2016-09-19 13:09 ` [PATCH 47/61] perf c2c report: Add cacheline browser Jiri Olsa
2016-09-20 20:10 ` Kim Phillips
2016-09-21 8:21 ` Jiri Olsa
2016-09-21 12:55 ` Jiri Olsa
2016-09-21 19:35 ` Kim Phillips
2016-09-19 13:09 ` [PATCH 48/61] perf c2c report: Add global stats stdio output Jiri Olsa
2016-09-19 13:09 ` [PATCH 49/61] perf c2c report: Add shared cachelines " Jiri Olsa
2016-09-19 13:09 ` [PATCH 50/61] perf c2c report: Add c2c related " Jiri Olsa
2016-09-19 13:10 ` [PATCH 51/61] perf c2c report: Allow to report callchains Jiri Olsa
2016-09-19 13:10 ` [PATCH 52/61] perf c2c report: Limit the cachelines table entries Jiri Olsa
2016-09-19 13:10 ` [PATCH 53/61] perf c2c report: Add support to choose local HITMs Jiri Olsa
2016-09-19 13:10 ` [PATCH 54/61] perf c2c report: Allow to set cacheline sort fields Jiri Olsa
2016-09-19 13:10 ` [PATCH 55/61] perf c2c report: Recalc width of global sort entries Jiri Olsa
2016-09-19 13:10 ` [PATCH 56/61] perf c2c report: Add cacheline index entry Jiri Olsa
2016-09-19 13:10 ` [PATCH 57/61] perf c2c report: Add support to manage symbol name length Jiri Olsa
2016-09-19 13:10 ` [PATCH 58/61] perf c2c report: Iterate node display in browser Jiri Olsa
2016-09-19 13:10 ` [PATCH 59/61] perf c2c report: Add help windows Jiri Olsa
2016-09-19 13:10 ` Jiri Olsa [this message]
2016-09-19 13:10 ` [PATCH 61/61] perf tools: Fix width computation for srcline sort entry Jiri Olsa
2016-09-19 14:33 ` Arnaldo Carvalho de Melo
2016-09-19 14:49 ` Jiri Olsa
2016-09-19 14:57 ` Arnaldo Carvalho de Melo
2016-09-20 21:43 ` [tip:perf/core] perf hists: " tip-bot for Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1474290610-23241-61-git-send-email-jolsa@kernel.org \
--to=jolsa@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@kernel.org \
--cc=andi@firstfloor.org \
--cc=dsahern@gmail.com \
--cc=dzickus@redhat.com \
--cc=jmario@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).